Introduction to Machine Learning
Introduction to Machine Learning
Machine learning is a branch of artificial intelligence and computer science which focuses on the
use of data and algorithms to imitate the way that humans learn, gradually improving its
accuracy.
In 1997, Tom Mitchell gave a “well-posed” mathematical and relational definition for Machine
Learning as, “A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves with
experience E.
• Predictor Variable: It is a feature(s) of the data that can be used to predict the output.
• Response Variable: It is the feature or the output variable that needs to be predicted by
using the predictor variable(s).
• Training Data: The Machine Learning model is built using the training data. The
training data helps the model to identify key trends and patterns essential to predict the
output.
• Testing Data: After the model is trained, it must be tested to evaluate how accurately it
can predict an outcome. This is done by the testing data set.
RKR21 IML II/I ALL BRANCHES
At this step, we must understand what exactly needs to be predicted. In our case, the objective is
to predict the possibility of rain by studying weather conditions. At this stage, it is also essential
to take mental notes on what kind of data can be used to solve this problem or the type of
approach you must follow to get to the solution.
Once you know the types of data that is required, you must understand how you can derive this
data. Data collection can be done manually or by web scraping. However, if you’re a beginner
and you’re just looking to learn Machine Learning you don’t have to worry about getting the
data. There are 1000s of data resources on the web, you can just download the data set and get
going.
RKR21 IML II/I ALL BRANCHES
Coming back to the problem at hand, the data needed for weather forecasting includes measures
such as humidity level, temperature, pressure, locality, whether or not you live in a hill station,
etc. Such data must be collected and stored for analysis.
The data you collected is almost never in the right format. You will encounter a lot of
inconsistencies in the data set such as missing values, redundant variables, duplicate values, etc.
Removing such inconsistencies is very essential because they might lead to wrongful
computations and predictions. Therefore, at this stage, you scan the data set for any
inconsistencies and you fix them then and there.
Grab your detective glasses because this stage is all about diving deep into data and finding all
the hidden data mysteries. EDA or Exploratory Data Analysis is the brainstorming stage of
Machine Learning. Data Exploration involves understanding the patterns and trends in the data.
At this stage, all the useful insights are drawn and correlations between the variables are
understood.
For example, in the case of predicting rainfall, we know that there is a strong possibility of rain if
the temperature has fallen low. Such correlations must be understood and mapped at this stage.
All the insights and patterns derived during Data Exploration are used to build the Machine
Learning Model. This stage always begins by splitting the data set into two parts, training data,
and testing data. The training data will be used to build and analyze the model. The logic of the
model is based on the Machine Learning Algorithm that is being implemented.
In the case of predicting rainfall, since the output will be in the form of True (if it will rain
tomorrow) or False (no rain tomorrow), we can use a Classification Algorithm such as Logistic
Regression.
RKR21 IML II/I ALL BRANCHES
After building a model by using the training data set, it is finally time to put the model to a test.
The testing data set is used to check the efficiency of the model and how accurately it can predict
the outcome. Once the accuracy is calculated, any further improvements in the model can be
implemented at this stage. Methods like parameter tuning and cross-validation can be used to
improve the performance of the model.
Step 7: Predictions
Once the model is evaluated and improved, it is finally used to make predictions. The final
output can be a Categorical variable (eg. True or False) or it can be a Continuous Quantity (eg.
the predicted value of a stock).
In our case, for predicting the occurrence of rainfall, the output will be a categorical variable.
So that was the entire Machine Learning process. Now it’s time to learn about the different ways
in which Machines can learn.
We can train machine learning algorithms by providing them the huge amount of data and let
them explore the data, construct the models, and predict the required output automatically.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:
RKR21 IML II/I ALL BRANCHES
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms
are widely used by various applications of speech recognition. Google assistant, Siri, Cortana,
and Alexa are using speech recognition technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.
RKR21 IML II/I ALL BRANCHES
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning. Google understands the
user interest using various machine learning algorithms and suggests the product as per customer
interest.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company
is working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the important symbol and spam emails in
our spam box, and the technology behind this is Machine learning.
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As
the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is
a genuine transaction or a fraud transaction.
RKR21 IML II/I ALL BRANCHES
1. Supervised learning
2. Unsupervised learning
3. Semi supervised learning
4. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample labeled
data to the machine learning system in order to train it, and on that basis, it predicts the output.
The system creates a model using labeled data to understand the datasets and learn about each
data, once the training and processing are done then we test the model by providing a sample
data to check whether it is predicting the exact output or not.
RKR21 IML II/I ALL BRANCHES
For Example:
• Let us consider images that are labeled a spoon or a knife. This known data is fed to the
machine, which analyzes and learns the association of these images based on its features
such as shape, size, sharpness, etc.
• Now when a new image is fed to the machine without any label, the machine is able to
predict accurately that it is a spoon with the help of the past data.
The goal of supervised learning is to map input data with the output data. The supervised
learning is based on supervision, and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning is spam filtering.
• Classification is used when the output variable is categorical i.e. with 2 or more classes.
• For example, yes or no, male or female, true or false, etc.
• In order to predict whether a mail is spam or not, we need to first teach the machine what
a spam mail is.
• This is done based on a lot of spam filters - reviewing the content of the mail, reviewing
the mail header, and then searching if it contains any false information.
• Certain keywords and blacklist filters that blackmails are used from already blacklisted
spammers.
• All of these features are used to score the mail and give it a spam score. The lower the
total spam score of the email, the more likely that it is not a scam.
• Based on the content, label, and the spam score of the new incoming mail, the algorithm
decides whether it should land in the inbox or spam folder.
RKR21 IML II/I ALL BRANCHES
Let’s consider two variables -humidity and temperature. Here, ‘temperature’ is the independent
variable and ‘humidity' is the dependent variable. If the temperature increases, then the humidity
decreases.
These two variables are fed to the model and the machine learns the relationship between them.
After the machine is trained, it can easily predict the humidity based on the given temperature.
• Prediction of future cases: Use the rule to predict the output for future inputs
• Knowledge extraction: The rule is easy to understand
• Compression: The rule is simpler than the data it explains
• Outlier detection: Exceptions that are not covered by the rule, e.g., fraud
Slow -it requires human experts to manually label training examples one by one
Costly -a model should be trained on the large volumes of hand-labeled data to provide
accurate predictions.
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any supervision.
The training is provided to the machine with the set of data that has not been labeled, classified,
or categorized, and the algorithm needs to act on that data without any supervision. The goal of
unsupervised learning is to restructure the input data into new features or a group of objects
with similar patterns.
Example:
• Let's take a similar example as before, but this time we do not tell the machine whether
it's a spoon or a knife.
• The machine identifies patterns from the given set and groups them based on their
patterns, similarities, etc.
In unsupervised learning, we don't have a predetermined result. The machine tries to find useful
insights from the huge amount of data. It can be further classifieds into two categories of
algorithms:
o Clustering
o Association
• Clustering is the method of dividing the objects into clusters that are similar between
them and are dissimilar to the objects belonging to another cluster.
• For example, finding out which customers made similar product purchases.
Example:
• Suppose a telecom company wants to reduce its customer churn rate by providing
personalized call and data plans.
• The behavior of the customers is studied and the model segments the customers with
similar traits.
• Group A customers use more data and also have high call durations. Group B customers
are heavy Internet users, while Group C customers have high call duration.
• So, Group B will be given more data benefit plants, while Group C will be given cheaper
called call rate plans and group A will be given the benefit of both.
RKR21 IML II/I ALL BRANCHES
• Let’s say that a customer goes to a supermarket and buys bread, milk, fruits, and wheat.
Another customer comes and buys bread, milk, rice, and butter.
• Now, when another customer comes, it is highly likely that if he buys bread, he will buy
milk too.
• Hence, a relationship is established based on customer behavior and recommendations
are made.
1. Similarity detection
2. Automatic labeling
3. Object segmentation (such as Person, Animal, Films)
The goal in such unsupervised learning problems may be to discover groups of similar examples
within the data, where it is called clustering, or to determine the distribution of data within the
input space, known as density estimation, or to project the data from a high-dimensional space
down to two or three dimensions for the purpose of visualization.
3) Semi-Supervised Learning
Semi-supervised learning is supervised learning where the training data contains very few
labeled examples and a large number of unlabeled examples.
The goal of a semi-supervised learning model is to make effective use of all of the available data,
not just the labeled data like in supervised learning.
• Semi-supervised learning is an important category that lies between the Supervised and
Unsupervised machine learning.
• To overcome the drawbacks of supervised learning and unsupervised learning
algorithms, the concept of Semi-supervised learning is introduced.
• Labeled data exists with a very small amount while it consists of a huge amount of
unlabeled data.
• Initially, similar data is clustered along with an unsupervised learning algorithm, and
further, it helps to label the unlabeled data into labeled data.
• It is why label data is a comparatively, more expensive acquisition than unlabeled data.
4) Reinforcement Learning
In many complex domains, reinforcement learning is the only feasible way to train a program to
perform at high levels. For example, in game playing, it is very hard for a human to provide
accurate and consistent evaluations of large numbers of positions, which would be needed to
train an evaluation function directly from examples. Instead, the program can be told when it has
won or lost, and it can use this information to learn an evaluation function that gives reasonably
accurate estimates of the probability of winning from any given position.
• Suppose there is an AI agent present within a maze environment, and his goal is to find
the diamond.
RKR21 IML II/I ALL BRANCHES
• The agent interacts with the environment by performing some actions, and based on those
actions, the state of the agent gets changed, and it also receives a reward or penalty as
feedback.
• The agent continues doing these three things (take action, change state/remain in the
same state, and get feedback), and by doing these actions, he learns and explores the
environment.
• The agent learns that what actions lead to positive feedback or rewards and what actions
lead to negative feedback penalty.
• As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative
point.
Trained using
Trained using Works on
Learns by using unlabeled data
Definition both labeled& interacting with
labeled data without any
unlabeled data the environment
guidance
No –predefined
Type of data Labeled data Unlabeled data Both
data
Type of Regression and Association and Classification and Exploitation or
problems classification Clustering Regression Exploration
Linear Regression,
Text document Q –Learning,
Algorithms Logistic Regression, K –Means
classifier SARSA
SVM, KNN etc.
Classify the data
Discover and also
Learn a series of
Aim Calculate outcomes underlying discovers
action
patterns underlying
patterns
Recommendation
Self Driving
Risk Evaluation, System, Text
Application Cars, Gaming,
Forecast Sales Anomaly Classification
Healthcare
Detection
Batch learning is also called offline learning. The models trained using batch or offline learning
are moved into production only at regular intervals based on the performance of models trained
with new data.
Building offline models or models trained in a batch manner requires training the models with
the entire training data set. Improving the model performance would require re-training all over
again with the entire training data set. These models are static in nature which means that once
they get trained; their performance will not improve until a new model gets re-trained. Offline
models or models trained using batch learning are deployed in the production environment by
replacing the old model with the newly trained model.
RKR21 IML II/I ALL BRANCHES
There can be various reasons why we can choose to adopt batch learning for training the models.
Some of these reasons are the following:
If the models trained using batch learning needs to learn about new data, the models need to be
retrained using the new data set and replaced appropriately with the model already in production
based on different criteria such as model performance. The whole process of batch learning can
be automated as well. The disadvantage of batch learning is it takes a lot of time and resources to
re-training the model.
The criteria based on which the machine learning models can be decided to train in a batch
manner depends on the model performance. Red-amber-green statuses can be used to determine
the health of the model based on the prediction accuracy or error rates. Accordingly, the models
can be chosen to be retrained or otherwise.
The following stakeholders can be involved in reviewing the model performance and leveraging
batch learning:
Business/product owners
Product managers
Data scientists
ML engineers
In online learning, the training happens in an incremental manner by continuously feeding data
as it arrives or in a small group. Each learning step is fast and cheap, so the system can learn
about new data on the fly, as it arrives.
RKR21 IML II/I ALL BRANCHES
Online learning is great for machine learning systems that receive data as a continuous
flow (e.g., stock prices) and need to adapt to change rapidly or autonomously. It is also a good
option if you have limited computing resources: once an online learning system has learned
about new data instances, it does not need them anymore, so you can discard them (unless you
want to be able to roll back to a previous state and “replay” the data) or move the data to another
form of storage (warm or cold storage) if you are using the data lake. This can save a huge
amount of space and cost.
Online learning algorithms can also be used to train systems on huge datasets that cannot fit in
one machine’s main memory (this is also called out-of-core learning). The algorithm loads part
of the data runs a training step on that data and repeats the process until it has run on all of the
data.
One of the key aspects of online learning is the learning rate. The rate at which you want your
machine learning to adapt to new data set is called the learning rate. A system with a high
learning rate will tend to forget the learning quickly. A system with a low learning rate will be
more like batch learning.
One of the big disadvantages of an online learning system is that if it is fed with bad data, the
system will have bad performance and the user will see the impact instantly. Thus, it is very
important to come up with appropriate data governance strategy to ensure that the data fed is of
RKR21 IML II/I ALL BRANCHES
high quality. In addition, it is very important to monitor the performance of the machine learning
system in a very close manner.
The following are some of the challenges for adopting an online learning method:
Data governance
Model governance includes appropriate algorithm and model selection on-the-fly
Online models require only a single deployment in the production setting and they evolve over a
period of time. The disadvantage that the online models have is that they don’t have the entire
dataset available for the training. The models are trained in an incremental manner based on the
assumptions made using the available data and the assumptions at times can be sub-optimal.
More complex because the model Less complex because the model is
Complexity keeps evolving over time as more fed with more consistent data sets
data becomes available. periodically.
Noisy Data-It is responsible for an inaccurate prediction that affects the decision as well
as accuracy in classification tasks.
Incorrect data-It is also responsible for faulty programming and results obtained in
machine learning models. Hence, incorrect data may affect the accuracy of the results
also.
Generalizing of output data-Sometimes, it is also found that generalizing output data
becomes complex, which results in comparatively poor future actions.
Regular monitoring and maintenance become compulsory for the same. Different results for
different actions require data change; hence editing of codes as well as resources for monitoring
them also become necessary.
Although Machine Learning and Artificial Intelligence are continuously growing in the market,
still these industries are fresher in comparison to others.
Machine learning includes analyzing the data, removing data bias, training data, applying
complex mathematical calculations, etc., making the procedure more complicated and quite
tedious.
Machine learning models are highly efficient in producing accurate results but are time-
consuming. Slow programming, excessive requirements' and overloaded data take more time to
provide accurate results than expected.
RKR21 IML II/I ALL BRANCHES
Overfitting:
•Whenever a machine learning model is trained with a huge amount of data, it starts
capturing noise and inaccurate data into the training data set. It negatively affects the
performance of the model.
•Let's understand with a simple example where we have a few training data sets such as
1000 mangoes, 1000 apples, 1000 bananas, and 5000 papayas. Then there is a
considerable probability of identification of an apple as papaya because we have a
massive amount of biased data in the training data set.
Underfitting:
•Whenever a machine learning model is trained with fewer amounts of data, and as a
result, it provides incomplete and inaccurate data and destroys the accuracy of the
machine learning model.
Probability
Probability is the foundation stone of ML, which tells how likely is the event to occur. The
value of Probability always lies between 0 to 1. 1 indicates more likely that event will occur. 0
indicates that event will not occur.
Formula:
Probability of an event = (Number of way an event can occur) / (Total number of outcomes)
When a coin is tossed, there are two possible outcomes: Heads (H) or Tails (T)
Number of ways Head can happen: 1(there is only 1 face with a “H" on Coin)
Total number of outcomes: 2(there are 2 faces altogether)
So the probability of Head (H)= 1/2 i.e 50%, similar for Tail also
RKR21 IML II/I ALL BRANCHES
Number of ways it can happen: 1(there is only 1 face with a "4" on it)
Total number of outcomes: 6(there are 6 faces altogether)
So the probability =1/6