0% found this document useful (0 votes)

26 views25 pages

Introduction to Machine Learning

The document provides an overview of machine learning, defining it as a branch of artificial intelligence that enables computers to learn from data and improve their performance over time. It outlines the machine learning process, including steps such as data gathering, preparation, exploratory analysis, model building, evaluation, and making predictions. Additionally, it discusses various applications of machine learning, types of learning systems, and specific algorithms used in supervised and unsupervised learning.

Uploaded by

22311a05hu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views25 pages

Introduction to Machine Learning

Uploaded by

22311a05hu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

RKR21 IML II/I ALL BRANCHES

1.1 Introduction of Machine Learning

Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed. As it is evident from the name, it gives the computer that makes it
more similar to humans: The ability to learn.

Machine learning is a branch of artificial intelligence and computer science which focuses on the
use of data and algorithms to imitate the way that humans learn, gradually improving its
accuracy.

A subset of machine learning is closely related to computational statistics, which focuses on

making predictions using computers; but not all machine learning is statistical learning. The
study of mathematical optimization delivers methods, theory and application domains to the field
of machine learning. Machine learning is an important component of the growing field of data
science. Through the use of statistical methods, algorithms are trained to make classifications or
predictions.

In 1997, Tom Mitchell gave a “well-posed” mathematical and relational definition for Machine
Learning as, “A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves with
experience E.

Fig1: Definition of Machine Learning

RKR21 IML II/I ALL BRANCHES

Definitions in Machine Learning:

• Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques

used to learn patterns from data and draw significant information from it. It is the logic
behind a Machine Learning model.
An example of a Machine Learning algorithm is, the Linear Regression algorithm.

• Model: A model is the main component of Machine Learning. A model is trained by

using a Machine Learning Algorithm. An algorithm maps all the decisions that a model is
supposed to take based on the given input, in order to get the correct output.

• Predictor Variable: It is a feature(s) of the data that can be used to predict the output.

• Response Variable: It is the feature or the output variable that needs to be predicted by
using the predictor variable(s).

• Training Data: The Machine Learning model is built using the training data. The
training data helps the model to identify key trends and patterns essential to predict the
output.

• Testing Data: After the model is trained, it must be tested to evaluate how accurately it
can predict an outcome. This is done by the testing data set.
RKR21 IML II/I ALL BRANCHES

Machine Learning Process:

Fig2: Process of Machine Learning

Step 1: Define the objective of the Problem Statement

At this step, we must understand what exactly needs to be predicted. In our case, the objective is
to predict the possibility of rain by studying weather conditions. At this stage, it is also essential
to take mental notes on what kind of data can be used to solve this problem or the type of
approach you must follow to get to the solution.

Step 2: Data Gathering

At this stage, you must be asking questions such as,

 What kind of data is needed to solve this problem?

 Is the data available?
 How can I get the data?

Once you know the types of data that is required, you must understand how you can derive this
data. Data collection can be done manually or by web scraping. However, if you’re a beginner
and you’re just looking to learn Machine Learning you don’t have to worry about getting the
data. There are 1000s of data resources on the web, you can just download the data set and get
going.
RKR21 IML II/I ALL BRANCHES

Coming back to the problem at hand, the data needed for weather forecasting includes measures
such as humidity level, temperature, pressure, locality, whether or not you live in a hill station,
etc. Such data must be collected and stored for analysis.

Step 3: Data Preparation

The data you collected is almost never in the right format. You will encounter a lot of
inconsistencies in the data set such as missing values, redundant variables, duplicate values, etc.
Removing such inconsistencies is very essential because they might lead to wrongful
computations and predictions. Therefore, at this stage, you scan the data set for any
inconsistencies and you fix them then and there.

Step 4: Exploratory Data Analysis

Grab your detective glasses because this stage is all about diving deep into data and finding all
the hidden data mysteries. EDA or Exploratory Data Analysis is the brainstorming stage of
Machine Learning. Data Exploration involves understanding the patterns and trends in the data.
At this stage, all the useful insights are drawn and correlations between the variables are
understood.

For example, in the case of predicting rainfall, we know that there is a strong possibility of rain if
the temperature has fallen low. Such correlations must be understood and mapped at this stage.

Step 5: Building a Machine Learning Model

All the insights and patterns derived during Data Exploration are used to build the Machine
Learning Model. This stage always begins by splitting the data set into two parts, training data,
and testing data. The training data will be used to build and analyze the model. The logic of the
model is based on the Machine Learning Algorithm that is being implemented.

In the case of predicting rainfall, since the output will be in the form of True (if it will rain
tomorrow) or False (no rain tomorrow), we can use a Classification Algorithm such as Logistic
Regression.
RKR21 IML II/I ALL BRANCHES

Step 6: Model Evaluation & Optimization

After building a model by using the training data set, it is finally time to put the model to a test.
The testing data set is used to check the efficiency of the model and how accurately it can predict
the outcome. Once the accuracy is calculated, any further improvements in the model can be
implemented at this stage. Methods like parameter tuning and cross-validation can be used to
improve the performance of the model.

Step 7: Predictions

Once the model is evaluated and improved, it is finally used to make predictions. The final
output can be a Categorical variable (eg. True or False) or it can be a Continuous Quantity (eg.
the predicted value of a stock).

In our case, for predicting the occurrence of rainfall, the output will be a categorical variable.

So that was the entire Machine Learning process. Now it’s time to learn about the different ways
in which Machines can learn.

1.2 Uses of Machine Learning

The need for machine learning is increasing day by day. The reason behind the need for machine
learning is that it is capable of doing tasks that are too complex for a person to implement
directly. As a human, we have some limitations as we cannot access the huge amount of data
manually, so for this, we need some computer systems and here comes the machine learning to
make things easy for us.

We can train machine learning algorithms by providing them the huge amount of data and let
them explore the data, construct the models, and predict the required output automatically.

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:
RKR21 IML II/I ALL BRANCHES

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.

Fig3: Machine Learning Applications

2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms
are widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.
RKR21 IML II/I ALL BRANCHES

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning. Google understands the
user interest using various machine learning algorithms and suggests the product as per customer
interest.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company
is working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the important symbol and spam emails in
our spam box, and the technology behind this is Machine learning.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is
a genuine transaction or a fraud transaction.
RKR21 IML II/I ALL BRANCHES

1.3 Types of Machine Learning Systems

At a broad level, machine learning can be classified into four types:

Fig4: Types of Machine Learning Techniques

1. Supervised learning
2. Unsupervised learning
3. Semi supervised learning
4. Reinforcement learning

1) Supervised Learning

Supervised learning is a type of machine learning method in which we provide sample labeled
data to the machine learning system in order to train it, and on that basis, it predicts the output.
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
RKR21 IML II/I ALL BRANCHES

For Example:

• Let us consider images that are labeled a spoon or a knife. This known data is fed to the
machine, which analyzes and learns the association of these images based on its features
such as shape, size, sharpness, etc.
• Now when a new image is fed to the machine without any label, the machine is able to
predict accurately that it is a spoon with the help of the past data.

Fig5: Example for Supervised learning algorithm

The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.

Supervised learning can be grouped further in two categories of algorithms:

o Classification
o Regression
RKR21 IML II/I ALL BRANCHES

Classification - Supervised Learning

• Classification is used when the output variable is categorical i.e. with 2 or more classes.
• For example, yes or no, male or female, true or false, etc.

Example: Spam Filtering

Fig6: Example for classification

• In order to predict whether a mail is spam or not, we need to first teach the machine what
a spam mail is.
• This is done based on a lot of spam filters - reviewing the content of the mail, reviewing
the mail header, and then searching if it contains any false information.
• Certain keywords and blacklist filters that blackmails are used from already blacklisted
spammers.
• All of these features are used to score the mail and give it a spam score. The lower the
total spam score of the email, the more likely that it is not a scam.
• Based on the content, label, and the spam score of the new incoming mail, the algorithm
decides whether it should land in the inbox or spam folder.
RKR21 IML II/I ALL BRANCHES

Regression - Supervised Learning

• Regression is used when the output variable is a real or continuous value. In this case,
there is a relationship between two or more variables i.e., a change in one variable is
associated with a change in the other variable.
• For example, salary based on work experience or weight based on height, etc.

Example: humidity and temperature

Fig7: Example for Regression

Let’s consider two variables -humidity and temperature. Here, ‘temperature’ is the independent
variable and ‘humidity' is the dependent variable. If the temperature increases, then the humidity
decreases.
These two variables are fed to the model and the machine learns the relationship between them.
After the machine is trained, it can easily predict the humidity based on the given temperature.

Some of the supervised learning applications are:

 Sentiment analysis (Twitter, Facebook, Netflix, YouTube, etc)
 Natural Language Processing
 Image classification
 Predictive analysis
 Pattern recognition
 Spam detection
 Speech/Sequence processing
RKR21 IML II/I ALL BRANCHES

Supervised Learning: Uses

• Prediction of future cases: Use the rule to predict the output for future inputs
• Knowledge extraction: The rule is easy to understand
• Compression: The rule is simpler than the data it explains
• Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

Supervised Learning: Limitations

 Slow -it requires human experts to manually label training examples one by one
 Costly -a model should be trained on the large volumes of hand-labeled data to provide
accurate predictions.

2) Unsupervised Learning

Unsupervised learning is a learning method in which a machine learns without any supervision.
The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects
with similar patterns.

Example:
• Let's take a similar example as before, but this time we do not tell the machine whether
it's a spoon or a knife.
• The machine identifies patterns from the given set and groups them based on their
patterns, similarities, etc.

Fig8: Example for Unsupervised Learning

RKR21 IML II/I ALL BRANCHES

In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:
o Clustering
o Association

Clustering - Unsupervised Learning

• Clustering is the method of dividing the objects into clusters that are similar between
them and are dissimilar to the objects belonging to another cluster.
• For example, finding out which customers made similar product purchases.

Example:

Fig9: Example for Clustering

• Suppose a telecom company wants to reduce its customer churn rate by providing
personalized call and data plans.
• The behavior of the customers is studied and the model segments the customers with
similar traits.
• Group A customers use more data and also have high call durations. Group B customers
are heavy Internet users, while Group C customers have high call duration.
• So, Group B will be given more data benefit plants, while Group C will be given cheaper
called call rate plans and group A will be given the benefit of both.
RKR21 IML II/I ALL BRANCHES

Association - Unsupervised Learning

• Association is a rule-based machine learning to discover the probability of the co-

occurrence of items in a collection.
• For example, finding out which products were purchased together.
Example:

Fig9: Example for Association

• Let’s say that a customer goes to a supermarket and buys bread, milk, fruits, and wheat.
Another customer comes and buys bread, milk, rice, and butter.
• Now, when another customer comes, it is highly likely that if he buys bread, he will buy
milk too.
• Hence, a relationship is established based on customer behavior and recommendations
are made.

Uses of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to supervised

learning because, in unsupervised learning, we don't have labeled input data.
• Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to
labeled data.
RKR21 IML II/I ALL BRANCHES

Unsupervised Learning- Limitations

• It has a limited area of applications, mostly for clustering purposes.

• It provides less accurate results.
An example of a clustering algorithm is k-Means where k refers to the number of clusters to
discover in the data.

Unsupervised Learning applications are:

1. Similarity detection
2. Automatic labeling
3. Object segmentation (such as Person, Animal, Films)

The goal in such unsupervised learning problems may be to discover groups of similar examples
within the data, where it is called clustering, or to determine the distribution of data within the
input space, known as density estimation, or to project the data from a high-dimensional space
down to two or three dimensions for the purpose of visualization.

3) Semi-Supervised Learning

Semi-supervised learning is supervised learning where the training data contains very few
labeled examples and a large number of unlabeled examples.
The goal of a semi-supervised learning model is to make effective use of all of the available data,
not just the labeled data like in supervised learning.

Fig10: Example for Semi supervised Learning

RKR21 IML II/I ALL BRANCHES

• Semi-supervised learning is an important category that lies between the Supervised and
Unsupervised machine learning.
• To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced.
• Labeled data exists with a very small amount while it consists of a huge amount of
unlabeled data.
• Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabeled data into labeled data.
• It is why label data is a comparatively, more expensive acquisition than unlabeled data.

“Semisupervised” learning attempts to improve the accuracy of supervised learning by

exploiting information in unlabeled data. This sounds like magic, but it can work!

Real-world applications of Semi-supervised Learning

• Speech Analysis
• Web content classification
• Protein sequence classification
• Text document classifier

4) Reinforcement Learning

Reinforcement learning is learning what to do — how to map situations to actions—so as to

maximize a numerical reward signal. The learner is not told which actions to take, but instead
must discover which actions yield the most reward by trying them.

Terms used in Reinforcement Learning

• Environment — Physical world in which the agent operates

• State — Current situation of the agent
• Reward — Feedback from the environment
• Policy — Method to map agent’s state to actions
• Value — Future reward that an agent would receive by taking an action in a
particular state
RKR21 IML II/I ALL BRANCHES

Reinforcement learning is a feedback-based learning method, in which a learning agent gets a

reward for each right action and gets a penalty for each wrong action. The agent learns
automatically with these feedbacks and improves its performance. In reinforcement learning, the
agent interacts with the environment and explores it. The goal of an agent is to get the most
reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
An example of a reinforcement problem is playing a game where the agent has the goal of
getting a high score and can make moves in the game and received feedback in terms of
punishments or rewards.

Fig11: Example for Reinforcement learning

In many complex domains, reinforcement learning is the only feasible way to train a program to
perform at high levels. For example, in game playing, it is very hard for a human to provide
accurate and consistent evaluations of large numbers of positions, which would be needed to
train an evaluation function directly from examples. Instead, the program can be told when it has
won or lost, and it can use this information to learn an evaluation function that gives reasonably
accurate estimates of the probability of winning from any given position.

• Suppose there is an AI agent present within a maze environment, and his goal is to find
the diamond.
RKR21 IML II/I ALL BRANCHES

• The agent interacts with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a reward or penalty as
feedback.
• The agent continues doing these three things (take action, change state/remain in the
same state, and get feedback), and by doing these actions, he learns and explores the
environment.
• The agent learns that what actions lead to positive feedback or rewards and what actions
lead to negative feedback penalty.
• As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative
point.

Supervised Unsupervised Semi-Supervised Reinforcement

Parameters
Learning Learning Learning Learning

Trained using
Trained using Works on
Learns by using unlabeled data
Definition both labeled& interacting with
labeled data without any
unlabeled data the environment
guidance
No –predefined
Type of data Labeled data Unlabeled data Both
data
Type of Regression and Association and Classification and Exploitation or
problems classification Clustering Regression Exploration

Supervision Yes No No supervision

Linear Regression,
Text document Q –Learning,
Algorithms Logistic Regression, K –Means
classifier SARSA
SVM, KNN etc.
Classify the data
Discover and also
Learn a series of
Aim Calculate outcomes underlying discovers
action
patterns underlying
patterns
Recommendation
Self Driving
Risk Evaluation, System, Text
Application Cars, Gaming,
Forecast Sales Anomaly Classification
Healthcare
Detection

Table1: Comparison of Machine Learning Techniques

RKR21 IML II/I ALL BRANCHES

1.4 Batch and Online Learning

Batch learning represents the training of machine learning models in a batch manner. In other
words, batch learning represents the training of the models at regular intervals such as weekly,
bi-weekly, monthly, quarterly, etc. The data gets accumulated over a period of time. The models
then get trained with the accumulated data from time to time at periodic intervals.

Batch learning is also called offline learning. The models trained using batch or offline learning
are moved into production only at regular intervals based on the performance of models trained
with new data.

Fig12: Process of Batch Learning

Building offline models or models trained in a batch manner requires training the models with
the entire training data set. Improving the model performance would require re-training all over
again with the entire training data set. These models are static in nature which means that once
they get trained; their performance will not improve until a new model gets re-trained. Offline
models or models trained using batch learning are deployed in the production environment by
replacing the old model with the newly trained model.
RKR21 IML II/I ALL BRANCHES

There can be various reasons why we can choose to adopt batch learning for training the models.
Some of these reasons are the following:

 The business requirements do not require frequent learning of models.

 The data distribution is not expected to change frequently. Therefore, batch learning is
suitable.
 The software systems (big data) required for batch learning is not available due to various
reasons including the cost. The fact that the model is trained with a lot of accumulated
data takes a lot of time and resources (CPU, memory space, disk space, disk I/O, network
I/O, etc.).
 The expertise required for creating the system for incremental learning is not available.

If the models trained using batch learning needs to learn about new data, the models need to be
retrained using the new data set and replaced appropriately with the model already in production
based on different criteria such as model performance. The whole process of batch learning can
be automated as well. The disadvantage of batch learning is it takes a lot of time and resources to
re-training the model.

The criteria based on which the machine learning models can be decided to train in a batch
manner depends on the model performance. Red-amber-green statuses can be used to determine
the health of the model based on the prediction accuracy or error rates. Accordingly, the models
can be chosen to be retrained or otherwise.

The following stakeholders can be involved in reviewing the model performance and leveraging
batch learning:

 Business/product owners
 Product managers
 Data scientists
 ML engineers
In online learning, the training happens in an incremental manner by continuously feeding data
as it arrives or in a small group. Each learning step is fast and cheap, so the system can learn
about new data on the fly, as it arrives.
RKR21 IML II/I ALL BRANCHES

Online learning is great for machine learning systems that receive data as a continuous
flow (e.g., stock prices) and need to adapt to change rapidly or autonomously. It is also a good
option if you have limited computing resources: once an online learning system has learned
about new data instances, it does not need them anymore, so you can discard them (unless you
want to be able to roll back to a previous state and “replay” the data) or move the data to another
form of storage (warm or cold storage) if you are using the data lake. This can save a huge
amount of space and cost.

Fig13: Process of Online Learning

Online learning algorithms can also be used to train systems on huge datasets that cannot fit in
one machine’s main memory (this is also called out-of-core learning). The algorithm loads part
of the data runs a training step on that data and repeats the process until it has run on all of the
data.

One of the key aspects of online learning is the learning rate. The rate at which you want your
machine learning to adapt to new data set is called the learning rate. A system with a high
learning rate will tend to forget the learning quickly. A system with a low learning rate will be
more like batch learning.

One of the big disadvantages of an online learning system is that if it is fed with bad data, the
system will have bad performance and the user will see the impact instantly. Thus, it is very
important to come up with appropriate data governance strategy to ensure that the data fed is of
RKR21 IML II/I ALL BRANCHES

high quality. In addition, it is very important to monitor the performance of the machine learning
system in a very close manner.
The following are some of the challenges for adopting an online learning method:

 Data governance
 Model governance includes appropriate algorithm and model selection on-the-fly

Online models require only a single deployment in the production setting and they evolve over a
period of time. The disadvantage that the online models have is that they don’t have the entire
dataset available for the training. The models are trained in an incremental manner based on the
assumptions made using the available data and the assumptions at times can be sub-optimal.

Online learning vs Batch learning

More complex because the model Less complex because the model is
Complexity keeps evolving over time as more fed with more consistent data sets
data becomes available. periodically.

More computational power is Less computational power is needed

Computational required because of the continuous because data is delivered in batches;
power feed of data that leads to continuous the model isn’t continuously refining
refinement. itself.

Harder to implement and control Easier to implement because offline

because the production model learning provides engineers with
Use in production
changes in real-time according to more time to perfect the model
its data feed. before deployment.

Used in applications where new Used in applications where data

data patterns are constantly patterns remain constant and don’t
Applications
required (e.g., weather prediction have sudden concept drifts (e.g.,
tools, Stock market predictions) image classification)

Table2: Comparison of Online learning vs Batch learning

RKR21 IML II/I ALL BRANCHES

1.4 Challenges in Machine Learning and Introduction to

Probability
1. Inadequate Training Data

Noisy Data-It is responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
Incorrect data-It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
Generalizing of output data-Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.

2. Monitoring and maintenance

Regular monitoring and maintenance become compulsory for the same. Different results for
different actions require data change; hence editing of codes as well as resources for monitoring
them also become necessary.

3. Lack of skilled resources

Although Machine Learning and Artificial Intelligence are continuously growing in the market,
still these industries are fresher in comparison to others.

4. Process Complexity of Machine Learning

Machine learning includes analyzing the data, removing data bias, training data, applying
complex mathematical calculations, etc., making the procedure more complicated and quite
tedious.

5. Slow implementations and results

Machine learning models are highly efficient in producing accurate results but are time-
consuming. Slow programming, excessive requirements' and overloaded data take more time to
provide accurate results than expected.
RKR21 IML II/I ALL BRANCHES

6.Overfitting and Under fitting

Overfitting:
•Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.

•Let's understand with a simple example where we have a few training data sets such as
1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a
considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set.

Underfitting:
•Whenever a machine learning model is trained with fewer amounts of data, and as a
result, it provides incomplete and inaccurate data and destroys the accuracy of the
machine learning model.

Probability
Probability is the foundation stone of ML, which tells how likely is the event to occur. The
value of Probability always lies between 0 to 1. 1 indicates more likely that event will occur. 0
indicates that event will not occur.

Formula:

Probability of an event = (Number of way an event can occur) / (Total number of outcomes)

Example 1: Tossing a Coin

When a coin is tossed, there are two possible outcomes: Heads (H) or Tails (T)
Number of ways Head can happen: 1(there is only 1 face with a “H" on Coin)
Total number of outcomes: 2(there are 2 faces altogether)
So the probability of Head (H)= 1/2 i.e 50%, similar for Tail also
RKR21 IML II/I ALL BRANCHES

Find the chances of rolling a "4" with a single die?

Number of ways it can happen: 1(there is only 1 face with a "4" on it)
Total number of outcomes: 6(there are 6 faces altogether)
So the probability =1/6

Machine Learning Basics for Students
No ratings yet
Machine Learning Basics for Students
8 pages
IML Unit 1 Notes
No ratings yet
IML Unit 1 Notes
25 pages
Jntuk r20 ML Unit-I (Chapter-I)
No ratings yet
Jntuk r20 ML Unit-I (Chapter-I)
18 pages
ML Unit 1
No ratings yet
ML Unit 1
22 pages
ML Unit-I
No ratings yet
ML Unit-I
28 pages
ML Unit-I
No ratings yet
ML Unit-I
34 pages
Reseearch Appdev 1
No ratings yet
Reseearch Appdev 1
9 pages
Unit 1
No ratings yet
Unit 1
21 pages
Chapter 4 - Machine Learning
No ratings yet
Chapter 4 - Machine Learning
81 pages
Unit 1
No ratings yet
Unit 1
41 pages
Machine Learning Unit 1 Notes
No ratings yet
Machine Learning Unit 1 Notes
22 pages
ML Lecture Notes Unit-1
No ratings yet
ML Lecture Notes Unit-1
45 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
Data - Analytics - Chapter 2
No ratings yet
Data - Analytics - Chapter 2
58 pages
Machine Learning for Level 5 Students
No ratings yet
Machine Learning for Level 5 Students
116 pages
Machinelearning Unit-1
No ratings yet
Machinelearning Unit-1
29 pages
Presenttion 33
No ratings yet
Presenttion 33
2 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
UNIT 1 Machine Learning
No ratings yet
UNIT 1 Machine Learning
38 pages
Machine Learning
No ratings yet
Machine Learning
84 pages
Karthik
No ratings yet
Karthik
10 pages
01 Unit-I - ML
No ratings yet
01 Unit-I - ML
50 pages
Cpelec2 Activity 1 Vargas Reinner
No ratings yet
Cpelec2 Activity 1 Vargas Reinner
4 pages
Introduction To Machine Learning Concepts
No ratings yet
Introduction To Machine Learning Concepts
1 page
Machine Learning Overview & Types
No ratings yet
Machine Learning Overview & Types
25 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
ML Chatgpt
No ratings yet
ML Chatgpt
6 pages
ML Note
No ratings yet
ML Note
8 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
5 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
ML Report
No ratings yet
ML Report
19 pages
Unit - 3 - ML
No ratings yet
Unit - 3 - ML
53 pages
Chapter 1 1
No ratings yet
Chapter 1 1
12 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
3 pages
Machine Learning With R and Python
No ratings yet
Machine Learning With R and Python
290 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
76 pages
Unit 9
No ratings yet
Unit 9
32 pages
Unit 1 ML R23
No ratings yet
Unit 1 ML R23
32 pages
Machine Learning1
100% (1)
Machine Learning1
11 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Aiml Notes
No ratings yet
Aiml Notes
12 pages
ML R23 Material
No ratings yet
ML R23 Material
79 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
10 pages
1
No ratings yet
1
4 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
6 pages
Supervised Learning (WWW - Anuupdates.org)
No ratings yet
Supervised Learning (WWW - Anuupdates.org)
60 pages
ML Unit1 (HKB)
No ratings yet
ML Unit1 (HKB)
7 pages
Module - 1
No ratings yet
Module - 1
9 pages
ML 5units
No ratings yet
ML 5units
284 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
26 pages
Machine Learning Unit 1 Que and Ans
No ratings yet
Machine Learning Unit 1 Que and Ans
6 pages
Magic Lobster Adventure Tale
No ratings yet
Magic Lobster Adventure Tale
4 pages
HVAC Sensor Guide
No ratings yet
HVAC Sensor Guide
5 pages
Hidraulica Magazine No.4 - 2013
No ratings yet
Hidraulica Magazine No.4 - 2013
93 pages
Discover Jordan's Rich Heritage
No ratings yet
Discover Jordan's Rich Heritage
3 pages
Brand Vilter Brochure Heat Pump
No ratings yet
Brand Vilter Brochure Heat Pump
8 pages
Morton 1999, Material de Pesados
No ratings yet
Morton 1999, Material de Pesados
27 pages
Mavinci Sirius 2014 Pro
No ratings yet
Mavinci Sirius 2014 Pro
36 pages
Procedure For Laboratory Test SOUNDNESS (By Use of Sodium Sulphate)
No ratings yet
Procedure For Laboratory Test SOUNDNESS (By Use of Sodium Sulphate)
2 pages
Disaster Prediction:: - Predicting Natural Disasters
No ratings yet
Disaster Prediction:: - Predicting Natural Disasters
19 pages
Case-Based Reasoning For Weather Prediction: C. Vasudevan
No ratings yet
Case-Based Reasoning For Weather Prediction: C. Vasudevan
1 page
Grade+8+Worksheet 1A
No ratings yet
Grade+8+Worksheet 1A
3 pages
The Commercial Dispatch Eedition 9-20-13
0% (1)
The Commercial Dispatch Eedition 9-20-13
20 pages
Essentials of Meteorology An Invitation To The Atmosphere 6th Edition C. Donald Ahrens All Chapters Instant Download
100% (9)
Essentials of Meteorology An Invitation To The Atmosphere 6th Edition C. Donald Ahrens All Chapters Instant Download
67 pages
I Am The Earth Lyrics
No ratings yet
I Am The Earth Lyrics
1 page
Cambridge B2 Vocabulary & Grammar
No ratings yet
Cambridge B2 Vocabulary & Grammar
1 page
Springside Saskatchewan BitznBytes Winter/Christmas Issue 2013
No ratings yet
Springside Saskatchewan BitznBytes Winter/Christmas Issue 2013
22 pages
TAPMI Forecasting Models Course
No ratings yet
TAPMI Forecasting Models Course
5 pages
English P3 SB Cover
No ratings yet
English P3 SB Cover
93 pages
AC Split Testing Report
No ratings yet
AC Split Testing Report
2 pages
Air To Air Heat Exchangers: Typical Applications
No ratings yet
Air To Air Heat Exchangers: Typical Applications
2 pages
Julia Rothman Nature-Notebook-2
100% (4)
Julia Rothman Nature-Notebook-2
171 pages
Wind Turbines: Performance and Suitability
No ratings yet
Wind Turbines: Performance and Suitability
8 pages
2009 Artificial Mouth Opening Fosters Anoxic Conditions
No ratings yet
2009 Artificial Mouth Opening Fosters Anoxic Conditions
7 pages
2 PB
No ratings yet
2 PB
14 pages
TASK FOR WEEK 9 (Edward)
No ratings yet
TASK FOR WEEK 9 (Edward)
3 pages
Termite Mounds As Hydrological Indicators - Case Studies From Three Taluks of Coimbatore District - Tamil Nadu - 2009
No ratings yet
Termite Mounds As Hydrological Indicators - Case Studies From Three Taluks of Coimbatore District - Tamil Nadu - 2009
8 pages
Outdoor Club
No ratings yet
Outdoor Club
3 pages
SOCIAL Exam G-8
No ratings yet
SOCIAL Exam G-8
6 pages
Ho - ปรับพื้นฐาน ม. 4 ปี 67
No ratings yet
Ho - ปรับพื้นฐาน ม. 4 ปี 67
20 pages
ASPT
No ratings yet
ASPT
64 pages

Introduction to Machine Learning

Uploaded by

Introduction to Machine Learning

Uploaded by

RKR21 IML II/I ALL BRANCHES

1.1 Introduction of Machine Learning

A subset of machine learning is closely related to computational statistics, which focuses on

Fig1: Definition of Machine Learning

Definitions in Machine Learning:

• Algorithm: A Machine Learning algorithm is a set of rules and statistical techniques

• Model: A model is the main component of Machine Learning. A model is trained by

Machine Learning Process:

Fig2: Process of Machine Learning

Step 1: Define the objective of the Problem Statement

Step 2: Data Gathering

At this stage, you must be asking questions such as,

 What kind of data is needed to solve this problem?

Step 3: Data Preparation

Step 4: Exploratory Data Analysis

Step 5: Building a Machine Learning Model

Step 6: Model Evaluation & Optimization

1.2 Uses of Machine Learning

Fig3: Machine Learning Applications

6. Email Spam and Malware Filtering:

7. Virtual Personal Assistant:

8. Online Fraud Detection:

1.3 Types of Machine Learning Systems

At a broad level, machine learning can be classified into four types:

Fig4: Types of Machine Learning Techniques

Fig5: Example for Supervised learning algorithm

Supervised learning can be grouped further in two categories of algorithms:

Classification - Supervised Learning

Example: Spam Filtering

Fig6: Example for classification

Regression - Supervised Learning

Example: humidity and temperature

Fig7: Example for Regression

Some of the supervised learning applications are:

Supervised Learning: Uses

Supervised Learning: Limitations

Fig8: Example for Unsupervised Learning

Clustering - Unsupervised Learning

Fig9: Example for Clustering

Association - Unsupervised Learning

• Association is a rule-based machine learning to discover the probability of the co-

Fig9: Example for Association

Uses of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to supervised

Unsupervised Learning- Limitations

• It has a limited area of applications, mostly for clustering purposes.

Unsupervised Learning applications are:

Fig10: Example for Semi supervised Learning

“Semisupervised” learning attempts to improve the accuracy of supervised learning by

Real-world applications of Semi-supervised Learning

Reinforcement learning is learning what to do — how to map situations to actions—so as to

Terms used in Reinforcement Learning

• Environment — Physical world in which the agent operates

Reinforcement learning is a feedback-based learning method, in which a learning agent gets a

Fig11: Example for Reinforcement learning

Supervised Unsupervised Semi-Supervised Reinforcement

Supervision Yes No No supervision

Table1: Comparison of Machine Learning Techniques

1.4 Batch and Online Learning

Fig12: Process of Batch Learning

 The business requirements do not require frequent learning of models.

Fig13: Process of Online Learning

Online learning vs Batch learning

More computational power is Less computational power is needed

Harder to implement and control Easier to implement because offline

Used in applications where new Used in applications where data

Table2: Comparison of Online learning vs Batch learning

1.4 Challenges in Machine Learning and Introduction to

2. Monitoring and maintenance

3. Lack of skilled resources

4. Process Complexity of Machine Learning

5. Slow implementations and results

6.Overfitting and Under fitting

Example 1: Tossing a Coin

Find the chances of rolling a "4" with a single die?

You might also like