Machine_Learning-MBA-unit-3 Machine_Learning-MBA-unit-3
Machine_Learning-MBA-unit-3 Machine_Learning-MBA-unit-3
Module 3
Introduction to Machine Learning
Definition of learning
Definition
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience E.
1
Examples
i) Handwriting recognition learning problem
• Task T: Recognising and classifying handwritten words within images
• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications
ii) A robot driving learning problem
• Task T: Driving on highways using vision sensors
• Performance measure P: Average distance traveled before an error
• training experience: A sequence of images and steering commands recorded while
observing a human driver
iii) A chess learning problem
• Task T: Playing chess
• Performance measure P: Percent of games won against opponents
• Training experience E: Playing practice games against itself
Thus Machine learning (ML) is a category of an algorithm that allows software applications to become more accurate in
predicting outcomes without being explicitly programmed. The basic premise of machine learning is to build algorithms that can
receive input data and use statistical analysis to predict an output while updating outputs as new data becomes available.
Examples of ML in use.
Prediction — Machine learning can also be used in the prediction systems. Considering the loan example, to compute the
probability of a fault, the system will need to classify the available data in groups.
Image recognition — Machine learning can be used for face detection in an image as well. There is a separate category for
each person in a database of several people.
Speech Recognition — It is the translation of spoken words into the text. It is used in voice searches and more. Voice user
interfaces include voice dialing, call routing, and appliance control. It can also be used a simple data entry and the
preparation of structured documents.
Financial industry and trading — companies use ML in fraud investigations and credit checks.
Statistical models are used by organizations to transform data into business insights. Statistical modelling is a method of
mathematically approximating the world. Statistical models are designed to find relationships between variables and the
significance of those relationships, whilst predicting future values. For example, marketers use statistical modelling to divide
customers into different segments based on various factors (e.g. priorities, demographic information, needs), so that they
can implement specific marketing strategies for different segments.
2
Though being handcrafted with high level of interpretability, statistical models have limited predictive
accuracy, since sometimes the underlying assumptions of the model are far too strict to represent reality and since statistics can
process only numeric data. Today’s businesses are adopting hybrid methods combining characteristics of statistical
modelling and machine learning, so as to understand in-depth how the underlying models work as well as generate accurate
predictions
3
2000-2010TriggersBusiness Interest
The beginning of 21st century saw more and more businesses shifting their attention towards ML. Tech
companies like Google started realising ML’s potential in applying complex mathematical calculations to big
data. They are investing considerable resources and researching heavily in the field to stay ahead of their
competitors. In 2014, Google acquired two-year-old AI startup Deep Mind for
$500M, making it Google’s largest European acquisition to date.
Currently, the most common ML applications are customer-centric, spanning everything from improving in-store
retail experiences with IoT to boosting security with biometric data to predicting and diagnosing disease. The
table below presents some examples of ML use cases in different industries that are driving business value
today.
“The future is already here. It’s just not evenly distributed.”—William Gibson Although ML is
being implemented more widely due to its proven value, it is not equally adopted across businesses. Only big
companies which can hire scarce data science and ML talents, and invest enormously in sophisticated IT
infrastructure, are greatly benefiting from ML. In addition, ML adoption in business faces other challenges such
as data management, model explain ability, use cases identification, internal change resistance and ethical
concern.
4
(a) ML development tools: Automated ML for Everyone
Though ML has enormous potential in business application, most companies are still in the nascent stages of ML
adoption. Like Business Intelligence and other tech industries, we can only see exponential growth of ML business
adoption when core ML development platforms and tools become affordable and available to every company.
Tech giants and leading AI tool startups are automating ML to democratise ML for everyone. Microsoft
Azure, Amazon SageMaker, Google AutoML are cloud-based ML platforms that enable data science professionals
to build models and operationalise ML insights. Google introduced AutoML-Zero in a paper early this year, which
shows that automatically building ML algorithms from scratch is possible by using evolutionary techniques.
Deployment: ML applications need faster lifecycles since data changes so rapidly in real business world
that companies might need to deploy a new model every day. And businesses need ML development tools
that can easily integrate with their existing systems.
Data Privacy and Security: In many business applications, effective ML models are usually trained on
sensitive data protected by strict regulations. Privacy-preserving ML is ever needed to ensure data
security and privacy.
Explainability: ML models are often seen as black box for their lack of transparency. In businesses,
knowing the “why” is as important as predicting the “what”. Unless ‘why’ an ML model made a decision
that can be explained to stakeholders, its applications will be strictly limited despite achieving higher
accuracy than traditional methods.
Model validation: Data is changing fast in business environment. ML models need to be validated that
they can predict accurately on new data.
ML ethics and bias: ML and the training data are necessarily biased. Knowing the different bias present
in ML life cycle and the potential consequences requires a transparent process. Reducing or even avoiding
discriminatory bias require bias testing in development cycle as well as monitoring and reviewing models
in operation.
Components of Learning
5
Basic components of learning process
The learning process, whether by a human or a machine, can be divided into four components,
namely, data storage, abstraction, generalization and evaluation. Figure 1.1 illustrates the various
components and the steps involved in the learning process.
6
1. Data storage
Facilities for storing and retrieving huge amounts of data are an important component of the
learning process. Humans and computers alike utilize data storage as a foundation for advanced
reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to store
data and use cables and other technology to retrieve data.
2. Abstraction
The second component of the learning process is known as abstraction.
Abstraction is the process of extracting knowledge about stored data. This involves creating general
concepts about the data as a whole. The creation of knowledge involves application of known models
and creation of new models.
The process of fitting a model to a dataset is known as training. When the model has been trained, the
data is transformed into an abstract form that summarizes the original information.
3. Generalization
The third component of the learning process is known as generalisation.
The term generalization describes the process of turning the knowledge about stored data into a form
that can be utilized for future action. These actions are to be carried out on tasks that are similar, but
not identical, to those what have been seen before. In generalization, the goal is to discover those
properties of the data that will be most relevant to future tasks.
4. Evaluation
Evaluation is the last component of the learning process.
It is the process of giving feedback to the user to measure the utility of the learned knowledge. This
feedback is then utilized to effect improvements in the whole learning process
7
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing the
quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast
enough by computers. The World Wide Web is huge; it is constantly growing and searching for
relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the
system designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
9. Machine learning methods are applied in the design of computer-controlled vehicles to steer
correctly when driving on a variety of roads.
10. Machine learning methods have been used to develop programmes for playing games such as
chess, backgammon and Go.
Types of Learning
In general, machine learning algorithms can be classified into three types.
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised learning
A training set of examples with the correct responses (targets) is provided and, based on this
training set, the algorithm generalises to respond correctly to all possible inputs. This is also called
learning from exemplars. Supervised learning is the machine learning task of learning a function that
maps an input to an output based on example input-output pairs.
In supervised learning, each example in the training set is a pair consisting of an input object
(typically a vector) and an output value. A supervised learning algorithm analyzes the training data and
produces a function, which can be used for mapping new examples. In the optimal case, the function
will correctly determine the class labels for unseen instances. Both classification and regression
8
problems are supervised learning problems. A wide range of supervised learning algorithms are
available, each with its strengths and weaknesses. There is no single learning algorithm that works best
on all supervised learning problems.
Remarks
A “supervised learning” is so called because the process of an algorithm learning from the
training dataset can be thought of as a teacher supervising the learning process. We know the correct
answers (that is, the correct outputs), the algorithm iteratively makes predictions on the training data
and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable level of
performance.
Example
Consider the following data regarding patients entering a clinic. The data consists of the gender
and age of the patients and each patient is labeled as “healthy” or “sick”.
Unsupervised learning
Correct responses are not provided, but instead the algorithm tries to identify similarities
between the inputs so that inputs that have something in common are categorised together. The
statistical approach to unsupervised learning is known as density estimation.
Unsupervised learning is a type of machine learning algorithm used to draw inferences from
datasets consisting of input data without labeled responses. In unsupervised learning algorithms, a
classification or categorization is not included in the observations. There are no output values and so
there is no estimation of functions. Since the examples given to the learner are unlabeled, the accuracy
of the structure that is output by the algorithm cannot be evaluated. The most common unsupervised
learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns
9
or grouping in data.
Example
Consider the following data regarding patients entering a clinic. The data consists of the
gender and age of the patients.
Based on this data, can we infer anything regarding the patients entering the clinic?
Reinforcement learning
This is somewhere between supervised and unsupervised learning. The algorithm gets told
when the answer is wrong, but does not get told how to correct it. It has to explore and try out
different possibilities until it works out how to get the answer right. Reinforcement learning is
sometime called learning with a critic because of this monitor that scores the answer, but does not
suggest improvements.
Example
Consider teaching a dog a new trick: we cannot tell it what to do, but we can reward/punish
it if it does the right/wrong thing. It has to find out what it did that made it get the
reward/punishment. We can use a similar method to train computers to do many tasks, such as
playing backgammon or chess, scheduling jobs, and controlling robot limbs. Reinforcement learning
is different from supervised learning. Supervised learning is learning from examples provided by a
knowledgeable expert.
What algorithms exist for learning general target functions from specific training examples?
In what settings will particular algorithms converge to the desired function, given sufficient
training data? Which algorithms perform best for which types of problems and
representations?
How much training data is sufficient? What general bounds can be found to relate the
confidence in learned hypotheses to the amount of training experience and the character of
the learner's hypothesis space?
When and how can prior knowledge held by the learner guide the process of generalizing
from examples? Can prior knowledge be helpful even when it is only approximately
10
correct?
What is the best strategy for choosing a useful next training experience, and how does the
choice of this strategy alter the complexity of the learning problem?
What is the best way to reduce the learning task to one or more function approximation
problems? Put another way, what specific functions should the system attempt to learn?
Can this process itself be automated?
How can the learner automatically alter its representation to improve its ability to
represent and learn the target function?
https://theintactone.com/2021/09/14/kmbnit02-ai-and-ml-for-business/
https://www.google.com/search?
q=classification+in+machine+learning+video+tutorial&sxsrf=AJOqlzVSqTvxcl0Rum4_9AL8kjUFrysSGg
%3A1677285073436&ei=0Vb5Y_SmGqWwseMP-
NWZ0A0&oq=classification+in+machine+learning+video&gs_lcp=Cgxnd3Mtd2l6LXNlcnAQARgBMgUIIRCgATIFCCEQoAEyCwg
hEBYQHhDxBBAdMgsIIRAWEB4Q8QQQHTILCCEQFhAeEPEEEB0yCwghEBYQHhDxBBAdMggIIRAWEB4QHTILCCEQFhAeEPEEE
B0yCwghEBYQHhDxBBAdMgsIIRAWEB4Q8QQQHToKCAAQRxDWBBCwAzoHCAAQsAMQQzoKCAAQgAQQFBCHAjoFCAAQgAQ
6CQgAEBYQHhDxBDoFCAAQhgM6BwghEKABEApKBAhBGABQggVY-
zZghUxoAXABeACAAeYDiAGLD5IBBzItNC4wLjKYAQCgAQHIAQrAAQE&sclient=gws-wiz-
serp#kpvalbx=_4Fb5Y4HiOIicseMP_K2xyAs_28
11
UNIT – I
Introduction to Python
Python is a widely used general-purpose, high level programming language. It was initially designed by Guido van Rossum
in 1991 and developed by Python Software Foundation. It was mainly developed for emphasis on code readability, and its
syntax allows programmers to express concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems more efficiently.
• On 16 October 2000, Python 2.0 was released with many new features.
• On 3rd December 2008, Python 3.0 was released with more testing and includes
new features.
# Script Begins
Statement1
1
Statement2
Statement3
# Script Ends
1. Python is object-oriented
Writing the programme in natural way not as procedural way
Eg: seeing student as a class and define operations over an object of that class typle
2. It’s free (open source)
Downloading python and installing python is free and easy
3. It’s Powerful
Built-in types and tools
Library utilities
Third party utilities (e.g. Numeric, NumPy, sciPy)
Automatic memory management
4. It’s Portable
Python runs virtually every major platform used today
As long as you have a compatible python interpreter installed, python
programs will run in exactly the same manner, irrespective of
platform.
5. It’s easy to use and learn
No intermediate compile
Python Programs are compiled automatically to an intermediate form
called byte code, which the interpreter then reads.
This gives python the development speed of an interpreter without
the performance loss inherent in purely interpreted languages.
Structure and syntax are pretty intuitive and easy to grasp.
6. Interpreted Language
Python is processed at runtime by python Interpreter
7. Interactive Programming Language
Users can interact with the python interpreter directly for writing the programs
8. Straight forward syntax
The formation of python syntax is simple and straight forward which also makes it popular.
2
Running Python in interactive mode:
Without passing python script file to the interpreter, directly execute code to Python prompt. Once you’re inside the
python interpreter, then you can start.
>>> x=[0,1,2]
>>> x
#If a quantity is stored in memory, typing its name will display it. [0, 1, 2]
>>> 2+3
The chevron at the beginning of the 1st line, i.e., the symbol >>> is a prompt the python interpreter uses to indicate that
it is ready. If the programmer types 2+6, the interpreter replies 8.
Int:
Int, or integer, is a whole number, positive or negative, without decimals, of unlimited length.
>>> print(24656354687654+2)
24656354687656
>>> print(20) 20
3
>>> a=10
>>> print(a) 10
# To verify the type of any object in Python, use the type() function:
>>> type(10)
<class 'int'>
>>> a=11
>>> print(type(a))
<class 'int'>
Float:
Float, or "floating point number" is a number, positive or negative, containing one or more decimals.
Float can also be scientific numbers with an "e" to indicate the power of 10.
>>> y=2.8
>>> y 2.8
>>> y=2.8
>>> print(type(y))
<class 'float'>
>>> type(.4)
<class 'float'>
>>> 2.
2.0
Example:
x = 35e3 y =
12E4
z = -87.7e100
print(type(x))
print(type(y))
print(type(z))
Output:
<class 'float'>
<class 'float'>
<class 'float'>
Boolean:
4
Objects of Boolean type may have one of two values, True or False:
>>> type(True)
<class 'bool'>
>>> type(False)
<class 'bool'>
String:
1. Strings in Python are identified as a contiguous set of characters represented in the quotation marks. Python
allows for either pairs of single or double quotes.
• Strings can be output to screen using the print function. For example: print("hello").
>>> print("mrcet college") mrcet
college
<class 'str'>
college
''
If you want to include either type of quote character within the string, the simplest way is to delimit the string with the
other type. If a string is to contain a single quote, delimit it with double quotes and vice versa:
Based on the data type of a variable, the interpreter allocates memory and decides what can be stored in the reserved
memory. Therefore, by assigning different data types to variables, you can store integers, decimals or characters in these
variables.
• A variable name can only contain alpha-numeric characters and underscores (A-z, 0-
9, and _ )
• Variable names are case-sensitive (age, Age and AGE are three different variables)
Assigning Values to Variables:
Python variables do not need explicit declaration to reserve memory space. The declaration happens automatically when
you assign a value to a variable. The equal sign (=) is used to assign values to variables.
The operand to the left of the = operator is the name of the variable and the operand to the right of the = operator is the
value stored in the variable.
For example −
c = "John" # A string
print (a)
(c)
100
1000.0
John
Multiple Assignment:
Python allows you to assign a single value to several variables simultaneously. For example :
6
a=b=c=1
Here, an integer object is created with the value 1, and all three variables are assigned to the same memory location. You
can also assign multiple objects to multiple variables.
For example −
a,b,c = 1,2,"mrcet“
Here, two integer objects with values 1 and 2 are assigned to variables a and b respectively, and one string object with the
value "john" is assigned to the variable c.
Output Variables:
Variables do not need to be declared with any particular type and can even change type after they have been set.
Output: mrcet
To combine both text and a variable, Python uses the “+” character:
Example
Output
Python is awesome
You can also use the + character to add a variable to another variable:
Example
x = "Python is " y =
"awesome" z = x + y
print(z)
Output:
Python is awesome
Expressions:
An expression is a combination of values, variables, and operators. They are Arithmetic Expression, Relational
7
Expression and Logical Expressions.
Arithmetic Expressions are formed by using Arithmetic operators, Relational Expression is formed by Relational operators
and Logical expression is formed by using Logical operators as listed below
Operators: In Python you can implement the following operations using the corresponding tokens.
add + Arithmetic
subtract - Arithmetic
multiply * Arithmetic
remainder % Arithmetic
or \ Logical
8
Examples of Arithmetic Expression
Y=x + 17
c=x+y
The above expression will evaluate the expression on right and assign the result to the variable on left
Precedence of Operators:
For example, x = 7 + 3 * 2; here, x is assigned 13, not 20 because operator * has higher precedence than +, so it first
multiplies 3*2 and then adds into 7.
Example 1:
>>> 3+4*2 11
>>> (10+10)*2 40
Example 2:
a = 20
b = 10
c = 15
d=5
e=0
e = (a + b) * c / d #( 30 * 15 ) / 5
print("Value of (a + b) * c / d is ", e)
e = ((a + b) * c) / d # (30 * 15 ) / 5
print("Value of ((a + b) * c) / d is ", e)
9
e = (a + b) * (c / d); # (30) * (15/5)
10
print("Value of (a + b) * (c / d) is ", e)
e = a + (b * c) / d; # 20 + (150/5)
print("Value of a + (b * c) / d is ", e)
Output:
C:/Users/MRCET/AppData/Local/Programs/Python/Python38-32/pyyy/opprec.py Value of (a + b) * c / d is 90.0
Value of ((a + b) * c) / d is 90.0 Value of (a +
b) * (c / d) is 90.0 Value of a + (b * c) / d is
50.0
b= int(input(‘Input Value”))
c =string(input(“Input Value”))
k=a+b
print(k)
The if statement contains a logical expression using which data is compared and a decision is made based on the result
11
of the comparison.
Syntax:
if expression:
statement(s)
If the boolean expression evaluates to TRUE, then the block of statement(s) inside the if statement is executed. If
boolean expression evaluates to FALSE, then the first set of code after the end of the if statement(s) is executed.
if Statement Flowchart:
a=3
if a > 2:
print(a, "is greater")
print("done")
a = -1
if a < 0:
print(a, "a is smaller")
print("Finish")
Output:
C:/Users/MRCET/AppData/Local/Programs/Python/Python38-32/pyyy/if1.py 3 is greater
done
12
-1 a is smaller Finish
If-Else structure :
An else statement can be combined with an if statement. An else statement contains the block of code (false block)
that executes if the conditional expression in the if statement resolves to 0 or a FALSE value.
The else statement is an optional statement and there could be at most only one else Statement following if.
Syntax of if - else :
if test expression:
Body of if stmts
else:
Body of else stmts
If - else Flowchart :
13
Fig: Operation of if – else statement
Example of if - else:
# Programe to illustrate if .. else statement
a = int (input(‘Enter the number’))
if a>5:
print (“a is greater than 5”)
else:
Output:
C:/Users/MRCET/AppData/Local/Programs/Python/Python38-32/pyyy/ifelse.py enter the
number 2
a is smaller than or equal to 5
----------------------------------------
Nested if : (If-elif-else):
The elif statement allows us to check multiple expressions for TRUE and execute a block of code
as soon as one of the conditions evaluates to TRUE. Similar to the else, the elif statement is
optional. However, unlike else, for which there can be at most one statement, there can be an
arbitrary number of elif statements following an if.
If test expression:
B
14
ody of if
stmts
elif test
expressi
on:
Body of elif stmts
else:
Body of else stmts
a=int(input('enter the
number'))
b=int(input('enter the
number'))
c=int(input('enter the
number')) if a>b:
print("a is
greater") elif
b>c:
print("b is
greater")
else:
print("c is greater")
15
Output:
C:/Users/MRCET/AppData/Local/Programs/Python/Python38-32/pyyy/ifelse.py enter the
number5
enter
the
number
2 enter
the
number
9 a is
greater
>>>
C:/Users/MRCET/AppData/Local/Programs/Python/Python38-32/pyyy/ifelse.py enter the
number2
enter
the
number
5 enter
the
number
9 c is
greater
-----------------------------
# Programme to illustrate the use of nested if..else structure
# Initialize a variable and and printing it as “true value” if it is 200,150 or 100 and “false expression value
otherwise”
var = 100 # initialize a variable
if var == 200:
print("1 - Got a true
expression value") print(var)
elif var == 150:
print("2 - Got a true
expression value") print(var)
elif var == 100:
print("3 - Got a true
expression value") print(var)
else:
print("4 - Got a false
expression value") print(var)
print("Good bye!")
Output:
C:/Users/MRCET/AppData/Local/Programs/Python/Python38-32/pyyy/ifelif.py 3 - Got
16
100
Good bye!
Problem: 1
To read the following details
Item Code
Description
Quantity
Prize
Amount
17
Unit 4
Supervised learning
A training set of examples with the correct responses (targets) is provided and, based on this
training set, the algorithm generalises to respond correctly to all possible inputs. This is also
called learning from examples. Supervised learning is the machine learning task of learning a
function that maps an input to an output based on example input-output pairs.
A “supervised learning” is so called because the process of an algorithm learning from the
training dataset can be thought of as a teacher supervising the learning process. We know the
correct answers (that is, the correct outputs), the algorithm iteratively makes predictions on the
training data and is corrected by the teacher. Learning stops when the algorithm achieves an
acceptable level of performance.
In supervised learning, each example in the training set is a pair consisting of an input object
(typically a vector) and an output value. A supervised learning algorithm analyzes the training
data and produces a function, which can be used for mapping new examples. In the optimal
case, the function will correctly determine the class labels for unseen instances.
Example
Consider the following data regarding patients entering a clinic. The data consists
of the gender and age of the patients and each patient is labeled as “healthy” or
“sick”.
18
Types of Supervised Learning
There are two types of Supervised Learning Methods – Classification and Regression.
Classification is a supervised machine learning method where the model tries to predict the
correct label of a given input data. In classification, the model is fully trained using the training
data, and then it is evaluated on test data before being used to perform prediction on new unseen
data.
For instance, an algorithm can learn to predict whether a given email is spam or ham (no spam),
as illustrated below.
19
Machine Learning Classification Vs. Regression
There are four main categories of Machine Learning algorithms: supervised, unsupervised, semi-
supervised, and reinforcement learning.
Even though classification and regression are both from the category of supervised learning, they
are not the same.
The prediction task is a classification when the target variable is discrete. An application is the
identification of the underlying sentiment of a piece of text.
The prediction task is a regression when the target variable is continuous. An example can be the
prediction of the salary of a person given their education degree, previous work experience,
geographical location, and level of seniority.
Healthcare
Training a machine learning model on historical patient data can help healthcare specialists
20
accurately analyze their diagnoses:
During the COVID-19 pandemic, machine learning models were implemented to efficiently
predict whether a person had COVID-19 or not.
Researchers can use machine learning models to predict new diseases that are more
likely to emerge in the future.
Education
Education is one of the domains dealing with the most textual, video, and audio data. This
unstructured information can be analyzed with the help of Natural Language technologies to
perform different tasks such as:
Agriculture is one of the most valuable pillars of human survival. Introducing sustainability can
help improve farmers' productivity at a different level without damaging the environment:
By using classification models to predict which type of land is suitable for a given type of seed.
Predict the weather to help them take proper preventive measures.
21
K-Nearest Neighbour
There are mainly two main classification tasks in Machine learning: binary, multi-class
classifications.
Binary Classification
In a binary classification task, the goal is to classify the input data into two mutually exclusive
categories. The training data in such a situation is labelled in a binary format: true and false;
positive and negative; O and 1; spam and not spam, etc. depending on the problem being tackled.
For instance, we might want to detect whether a given image is a truck or a boat.
Logistic Regression and Support Vector Machines algorithms, Decision tree are natively
designed for binary classifications.
Multi-Class Classification
The multi-class classification, on the other hand, has at least two mutually exclusive class labels,
where the goal is to predict to which class a given input example belongs to. In the following
case, the model correctly classified the image to be a plane.
Most of the binary classification algorithms can be also used for multi-class classification. These
algorithms include but are not limited to:
Random Forest
Naive Bayes
K-Nearest Neighbours
Gradient Boosting
SVM
Logistic Regression.
22
Unit - 5
Unsupervised learning
Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden
patterns or data groupings without the need for human intervention. Supervised machine
learning is generally used to classify data or make predictions, whereas unsupervised
learning is generally used to understand relationships within datasets. Supervised machine
learning is much more resource-intensive because of the need for labelled data. clustering is an
unsupervised method that works on datasets in which there is no outcome (target) variable
nor is anything known about the relationship between the observations, that is, unlabeled
data.
We can think of unsupervised learning problems as being divided into two categories: clustering
and association rules. Clustering is an unsupervised learning technique, which groups unlabeled
data points based on their similarity and differences.
Clustering
Clustering is used to identify groups of similar objects in datasets with two or more variable
quantities. In practice, this data may be collected from marketing, biomedical, or geospatial
databases, among many other places.
In machine learning too, we often group examples as a first step to understand a subject (data set) in a
machine learning system. Grouping unlabeled examples is called clustering. As the examples are
unlabeled, clustering relies on unsupervised machine learning.
Clustering is the task of dividing the unlabeled data or data points into different clusters such that
similar data points fall in the same cluster than those which differ from the others. In simple
words, the aim of the clustering process is to segregate groups with similar traits and assign them
into clusters.
23
K-Mean Clustering
k-means clustering algorithm
One of the most used clustering algorithm is k-means. It allows to group the data
according to the existing similarities among them in k clusters, given as input to the algorithm.
I’ll start with a simple
example.
Let’s imagine we have 5 objects (say 5 people) and for each of them we know two features
(height and
weight). We want to group them into k=2 clusters.
Our dataset will look like this:
First of all, we have to initialize the value of the centroids for our clusters. For instance, let’s
choose Person 2 and Person 3 as the two centroids c1 and c2, so that c1=(120,32) and
c2=(113,33). Now we compute the euclidian distance between each of the two centroids and
each point in the data. If you did all the calculations, you should have come up with the
following numbers:
At this point, we will assign each object to the cluster it is closer to (that is taking the minimum
between the two computed distances for each object).
We can then arrange the points as follows:
Person 1 → cluster 1
Person 2 → cluster 1
Person 3 → cluster 2
Person 4 → cluster 1
Person 5→ cluster 2
Let’s iterate, which means to redefine the centroids by calculating the mean of the members of
each of the two clusters.
So c’1 = ((167+120+175)/3, (55+32+76)/3) = (154, 54.3) and c’2 = ((113+108)/2, (33+25)/2) = (110.5, 29)
Then, we calculate the distances again and re-assign the points to the new centroids. We repeat this
process until the centroids don’t move anymore (or the difference between them is under a certain small
threshold). In our case, the result we get is given in the figure below. You can see the two different
24
clusters labelled with two different colours and the position of the centroids, given by the crosses.
25