National College of Business Administration
Data Science
Prof. Shanzay Khan
Group Members
Faizan Naeem: BSCS(V)-F24-81
Sadia Mushtaq: BSCS(V)-F24-82
Areesha Ishtiaq: BSCS(V)-F24-80
Qurat ul Ain: BSITM(V)-F24-01
Fariha Moeen: BSCS(V)-F24-14
Husnain Haider: BSCS(V)-F24-64
Talha Akram: BSCS(V)-F24-109
Machine learning:
Machine learning (ML) is a subset of artificial intelligence (AI) that allows machines to learn
patterns from data and make decisions or predictions without needing to be explicitly
programmed.
Machine Learning
Supervised Unsupervised
Machine Learning Machine Learning
Supervised Machine Learning:
Supervised Machine Learning is a method where a model is trained using labeled
data, meaning the input data comes with known outputs. The model learns the
relationship between inputs and outputs to make accurate predictions on new, unseen
data.
supervised
machine learning
Classification Regression
Classification:
Classification is used to categorize data into specific groups or predict the
likelihood of an event happening in the future.
Classification
Support Vector K-Nearest Logistic
Naive Bayes Decision trees
Machines (SVM) Neighbors (KNN) Regression
Support Vector Machine (SVM):
SVM is a supervised machine learning algorithm used for solving
classification and regression problems. It identifies the optimal hyperplane that maximizes the
margin between different classes of data points. Applications include image recognition, text
categorization, and bioinformatics.
Key Concepts in SVM:
Hyperplane: A decision boundary that separates data into different classes in a feature
space.
Margin: The distance between the hyperplane and the nearest data points from each
class.
Support Vectors: The data points closest to the hyperplane, crucial for determining its
position.
Kernel Trick:
The kernel trick transforms data into a higher-dimensional space, enabling SVM
to handle non-linear relationships effectively. It uses kernel functions to perform this
transformation without explicit computation.
Common Kernel Functions:
Linear Kernel: For linearly separable data.
Polynomial Kernel: Captures polynomial relationships.
Radial Basis Function (RBF) or Gaussian Kernel: Effective for complex, non-linear
relationships.
Sigmoid Kernel: Often used in neural networks.
Benefits of Kernels: Kernels enable SVM to tackle complex problems, such as image
classification and text processing, without explicitly transforming data into higher dimensions.
How Does SVM Work?
1. Input Data:
SVM starts with labeled data, meaning the data already has known groups or categories (e.g.,
"spam" vs. "not spam").
2. Linear SVM (Simple Data):
If the data can be separated with a straight line, SVM finds the line that creates the widest
margin between groups.
3. Non-linear SVM (Complex Data):
If the data isn’t linearly separable, SVM uses a kernel to transform it into a space where it can
be separated.
4. Optimization
SVM uses math to find the best hyperplane by balancing two things:
Maximizing the margin.
Minimizing mistakes in classification.
A parameter called C helps control this balance.
Why Use SVM?
1. High Accuracy:
Works well with data that has clear boundaries.
2. Avoids Overfitting
SVM generalizes well, even for high-dimensional data, by avoiding over-complicating the model.
3. Flexible Kernels
Can handle both simple and complex data using different kernels.
4. Effective with Small Datasets
Works well when there’s not much data but the features (variables) are meaningful.
Challenges with SVM
1. Takes Time with Large Datasets:
SVM is slow for very large datasets or datasets with many features.
2. Requires Tuning
You need to carefully adjust parameters like C and the kernel type for the best results.
3. Struggles with Overlapping Data
If the groups overlap too much, SVM might not perform well.
What Can SVM Be Used For?
1. Classification Tasks:
Spam detection (spam vs. not spam).
Image recognition (e.g., identifying handwritten digits).
Disease diagnosis (e.g., detecting cancer).
2. Regression Tasks:
Predicting prices (e.g., house prices).
Forecasting trends (e.g., stock prices).
3. Outlier Detection
Fraud detection in transactions.
Finding anomalies in security systems.
Why Choose SVM?
If you have small or medium datasets with clear groups, SVM is an excellent choice.
It’s reliable, flexible, and works well even for complex datasets.