BCS602 – MACHINE LEARNING
MODULE-1 Q & A
INTRODUCTION TO MACHINE LEARNING &
UNDERSTANDING DATA-1
BCS602 -MACHINE LEARNING
Module 1
Introduction: Need for Machine Learning, Machine Learning Explained, Machine
Learning in Relation to other Fields, Types of Machine Learning, Challenges of
Machine Learning, Machine Learning Process, Machine Learning Applications.
Understanding Data – 1: Introduction, Big Data Analysis Framework,
Descriptive Statistics, Univariate Data Analysis and Visualization.
4/2/2025 ELAIYARAJA P
1. Describe the Machine learning and Classify
the different types of Machine learning?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
➢ Machine learning is a subset of Artificial Intelligence (AI) that
enables computers to learn from data and make predictions
without being explicitly programmed.
➢ Machine learning teaches computers to recognize patterns
and make decisions automatically using data and algorithms.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
It can be broadly categorized into three types:
➢ Supervised Learning: Trains models on labeled data to predict or
classify new, unseen data.
➢ Unsupervised Learning: Finds patterns or groups in unlabeled
data, like clustering or dimensionality reduction.
➢ Reinforcement Learning: Learns through trial and error to
maximize rewards, ideal for decision-making tasks.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
➢ Supervised Machine Learning (SVM): Supervised machine
learning uses labeled datasets to train algorithms to classify data
or predict outcomes.
➢ As input data is inputted into the model, its weights modify until
it fits into the model; this process is known as cross validation
which ensures the model is not overfitted or underfitted.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Types of Supervised Learning in Machine Learning
➢ Classification: Where the output is a categorical variable
(e.g., spam vs. non-spam emails, yes vs. no).
where the goal is to predict discrete labels or categories
➢ Regression: Where the output is a continuous variable (e.g.,
predicting house prices, stock prices).
where the aim is to predict continuous numerical values.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
➢ Semi-supervised learning is a type of machine learning that
falls in between supervised and unsupervised learning. It is a
method that uses a small amount of labeled data and a large
amount of unlabeled data to train a model.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Unsupervised Machine Learning: Unsupervised machine
learning analyses and clusters unlabeled datasets using
machine learning methods. The algorithms find hidden
patterns or data groupings without human interaction. This
method is useful for exploratory data analysis, cross-selling,
consumer segmentation, and image and pattern recognition.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Unsupervised Machine Learning:
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Unsupervised Machine Learning:
The input to the unsupervised learning models is as follows:
▪ Unstructured data: May contain noisy(meaningless) data,
missing values, or unknown data
▪ Unlabeled data: Data only contains a value for input
parameters, there is no targeted value(output).
It is easy to collect as compared to the labeled one in the Supervised
approach.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Unsupervised Machine Learning:
Unsupervised Learning Algorithms
There are mainly 3 types of Algorithms which are used for
Unsupervised dataset.
▪ Clustering
▪ Association Rule Learning
▪ Dimensionality Reduction
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
➢ Reinforcement Machine Learning: Reinforcement machine
learning is a type of machine learning model that is similar to
supervised learning but does not use sample data to train the
algorithm. This model learns by trial and error.
➢ RL involves learning through experience. In RL, an agent
learns to achieve a goal in an uncertain, potentially complex
environment by performing actions and receiving feedback
through rewards or penalties.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Machine Learning Types -Overall
4/2/2025 ELAIYARAJA P
2. Discuss the various stages of Machine learning?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Stages of Machine Learning:
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Machine Learning Process
4/2/2025 ELAIYARAJA P
3. Explain how Supervised Machine learning works?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
How Supervised Machine Learning works:
➢ Training Data: The model is provided with a training dataset
that includes input data (features) and corresponding output data
(labels or target variables).
➢ Learning Process: The algorithm processes the training data,
learning the relationships between the input features and the
output labels. This is achieved by adjusting the model’s
parameters to minimize the difference between its predictions and
the actual labels.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
How Supervised Machine Learning works:
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
What is a Model for training?
A model can be any one of the following –
• Mathematical equation
• Relational diagrams like graphs/trees
• Logical if/else rules
• Groupings called clusters
4/2/2025 ELAIYARAJA P
4. Identify the purpose & use of Classification &
Regression in various ML algorithms?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Regression,
Algorithm Purpose Method Use Cases
Classification
Linear equation
Predict continuous Predicting continuous
Linear Regression Regression minimizing sum of
output values values
squares of residuals
Logistic function
Predict binary output Binary classification
Logistic Regression Classification transforming linear
variable tasks
relationship
Model decisions and Tree-like structure with Classification and
Decision Trees Both
outcomes decisions and outcomes Regression tasks
Reducing overfitting,
Improve classification Combining multiple
Random Forests Both improving prediction
and regression accuracy decision trees
accuracy
Maximizing margin
Create hyperplane for
between classes or Classification and
SVM Both classification or predict
predicting continuous Regression tasks
continuous values
values
Finding k closest
Predict class or value Classification and
neighbors and
KNN Both based on k closest Regression tasks,
predicting based on
neighbors sensitive to noisy data
majority or average
Classification and
Combine weak learners Iteratively correcting Regression tasks to
Gradient Boosting Both
to create strong model errors with new models improve prediction
accuracy
Predict class based on Bayes’ theorem with Text classification, spam
Naive Bayes Classification feature independence feature independence filtering, sentiment
assumption assumption analysis, medical
4/2/2025 ELAIYARAJA P
5. Describe the Key Concepts of Reinforcement
Learning?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Key Concepts of Reinforcement Learning
▪ Agent: The learner or decision-maker.
▪ Environment: Everything the agent interacts with.
▪ State: A specific situation in which the agent finds itself.
▪ Action: All possible moves the agent can make.
▪ Reward: Feedback from the environment based on the action
taken.
4/2/2025 ELAIYARAJA P
6. Identify the challenges of Machine Learning?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Challenges of Machine Learning
▪ Ill-posed problems – problems whose specifications are not
clear
▪ Huge data
▪ Huge computation power
▪ Complexity of algorithms
▪ Bias-variance
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Challenges of Machine Learning
• Poor Quality of Data.
• Underfitting of Training Data.
• Overfitting of Training Data.
• Machine Learning is a Complex Process.
• Lack of Training Data.
• Slow Implementation.
• Imperfections in the Algorithm When Data Grows.
4/2/2025 ELAIYARAJA P
7. Discuss the Applications of Machine Learning?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Machine Learning Applications
•Image Recognition
•Speech Recognition
•Recommender Systems
•Fraud Detection
•Self Driving Cars
•Medical Diagnosis
•Stock Market Trading
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Introduction
Machine Learning Applications
4/2/2025 ELAIYARAJA P
8. Describe Bigdata & mention the characteristics
of Bigdata?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
BIG DATA
Big data is a collection of extremely large and complex
data sets that are difficult to store, process, and
analyze. It can include structured, unstructured, and
semi-structured data.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
BIG DATA
Characteristics
Volume: Big data is huge in size.
Velocity: Big data is generated at high speed.
Variety: Big data can include a variety of data types, such as
structured, semi-structured, and unstructured data.
4/2/2025 ELAIYARAJA P
9. Discuss about the Bigdata Framework?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Big Data Analysis Framework
A big data analysis framework in machine learning is a
structured approach to managing and analyzing large amounts
of data using machine learning algorithms.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
BIG DATA
What are Big Data Frameworks?
❑ Big data frameworks are software ecosystems that facilitate the
management, processing, and analysis of vast and complex data sets.
These toolkits provide capabilities for:
• Efficiently storing enormous amounts of information across distributed
systems.
• Processing different kinds of data such as structured, semi-structured, and
unstructured.
• Analyzing data by using advanced methods so as to discover hidden
patterns or trends.
• Holistic approach toward managing diverse applications sources and tools
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
BIG DATA
Key factors to consider when selecting a big data framework:
▪ Processing needs:Batch processing, real-time processing , or a
combination of both?
▪ Data types : Structured, semi-structured, or unstructured data?
▪ Scalability: Ability to handle growing data volumes?
▪ Integration: Compatibility with existing data infrastructure and tools?
▪ Technical expertise: In-house skillset for framework implementation and
maintenance?
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
BIG DATA frame work Tools
1. Apache Spark: The Versatile Powerhouse
2. Apache Flink: The Real-Time Champion
3. Apache Kafka: The Real-Time Data Stream Maestro
4. Apache Presto: The Interactive SQL Powerhouse for Big Data
5. Apache HBase: The Scalable NoSQL Database
6. Apache Phoenix: Bridging the SQL Gap for Hbase
7. Apache Drill: An Alternative for Fast, Interactive SQL
4/2/2025 ELAIYARAJA P
10. Mention the various file types of Data storage ?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Data Storage
➢ Comma-separated values (CSV) is a text file format that uses
commas to separate values, and newlines to separate records. A
CSV file stores tabular data (numbers and text) in plain text,
where each line of the file typically represents one data record.
➢ CSV is a delimited data format that has fields/columns separated
by the comma character and records/rows terminated by
newlines.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Data Storage
➢ TSV stands for tab-separated values, a file format that uses tabs to
separate data fields. It's a simple way to store data in plain text,
and it's commonly used for exchanging data between
spreadsheets, databases, and word processors.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Data Storage
4/2/2025 ELAIYARAJA P
11. Discuss the types of Bigdata Analytics?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Big Data Analytics and Types of Analytics
Types of Data Analytics
There are four major types of data analytics:
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Big Data Analytics and Types of Analytics
Types of Data Analytics
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Descriptive Statistics
Descriptive statistics help summarize and organize data so it
becomes more understandable.
Types of Descriptive Statistics
1. Describe the central position within a dataset(Measures of
Central Tendency)
2. Understanding Data Dispersion (Measure of Variability)
3. How the data is distributed (Measures of Frequency
Distribution)
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
1. Measures of Central Tendency
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
2. Measure of Variability
Understanding Data Dispersion
Range = Largest data value – smallest data value
Variance : average squared deviation from the mean
[Link] of Frequency Distribution
How data points are distributed across different categories or
intervals. Helps identify patterns, outliers and the overall
structure of the dataset.
4/2/2025 ELAIYARAJA P
12. Write the differences between Mean, Median
& Mode?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Differences between Mean, Median and Mode.
Feature Mean Median Mode
Mean is the Median is the middle Mode is the most
Definition average of all value when data is frequently occurring
values. sorted. value in the dataset.
Mean is sensitive to Median is not Mode is not sensitive
Sensitivity
outliers. sensitive to outliers . to outliers.
Calculated by
adding up all Calculated by finding
Calculated by finding
values of a dataset which value occurs
Calculation the middle value in a
and dividing them more number of
list of data.
by the total number times in a dataset.
of values in dataset.
Value of mean may Value of median is Value of mode is also
Representation or may not be in always a value from always a value from
dataset. the dataset. the dataset.
4/2/2025 ELAIYARAJA P
13. Mention the various types of data analysis and
explain the Univariate data analysis & data
visualization ?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Types of Data : Based on Variables
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Univariate data analysis
▪ Univariate data analysis is a statistical method that analyzes
data with a single variable. It's the simplest form of data
analysis.
▪ The main goal is to describe the data, summarize it, and find
patterns.
▪ Example :
Heights
164 167.3 170 174.2 178 180 186
(in cm)
▪ There are three main types of univariate analyses: calculations
of frequencies, central tendency, and dispersion,
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 – Understanding Data – 1
Univariate Data Visualization
4/2/2025 ELAIYARAJA P
14. For a given Univariate dataset S = {160, 161,
167, 169, 170, 172, 174, 175, 177, 181 } of
marks. Find mean, median, mode, standard
deviation and variance ?.
4/2/2025 ELAIYARAJA P
BCS602 -MACHINE LEARNING
Module 1 Q & A
THE END
4/2/2025 ELAIYARAJA P