Demystifying
Supervised Learning
Algorithms
Welcome to this introductory guide on supervised learning, a core
concept in data science. We'll explore fundamental algorithms that
empower machines to learn from data and make predictions.
Chapter 1
Linear Regression: Predicting Continuous Values
What it is How it works
A supervised learning algorithm for predicting continuous Finds the "best fit line" that minimizes the error between
numerical values. predicted and actual data points, represented by y = mx
+ c.
Key components Types
Dependent Variable (y): The output you want to predict. Simple: One independent variable.
Independent Variable (x): The input feature(s). Multiple: Many independent variables.
Slope (m): The rate of change.
Y-intercept (c): Where the line crosses the y-axis.
Real-world Use: Predicting house prices based on size, or sales revenue based on advertising spend.
Logistic Regression: Classifying Outcomes
What it is
A supervised learning algorithm for classification problems, predicting the
probability of an event occurring (0 or 1).
How it works
It outputs probabilities between 0 and 1, then uses a threshold (e.g., 0.5) to
classify outcomes into categories.
Outcome types
Binary Classification: Two possible outcomes (e.g., spam/not spam, yes/no).
Multi-class Classification: More than two outcomes (e.g., different disease
types).
Real-world Use: Spam detection in emails, predicting the
likelihood of a patient having a disease like diabetes.
Decision Trees: Flowchart for Decisions
What it is: A supervised learning algorithm
Wear sunglasses
that creates a tree-like model of decisions and
Protect eyes from sun
their possible consequences.
Is it sunny?
How it works
Look for sunshine
It splits data based on decisions, much like a
Take umbrella
flowchart, until it reaches a final outcome (leaf
Prepare for rain
node).
Is it raining?
Check current weather
Components
Root Node: The initial decision.
Branches: Paths based on conditions.
Leaf Nodes: Final outcomes.
Real-world Use: Credit approval systems,
medical diagnosis, customer segmentation.
Random Forest: Ensemble Power
Multiple trees Majority voting Robustness
Combines many decision trees for Final prediction is based on the Reduces overfitting and bias
improved accuracy and stability. most common outcome from all compared to a single decision tree.
trees.
Real-world Use: Fraud detection, stock market prediction, image classification.
Support Vector Machines (SVM): Optimal Separation
What it is
A powerful supervised learning algorithm for classification that finds the
optimal boundary.
How it works
It constructs a hyperplane (decision boundary) that maximizes the margin
between different classes.
Support Vectors
Data points closest to the hyperplane are called support vectors and are
crucial for defining the boundary.
Real-world Use: Face recognition, text categorization, and handwriting
detection.
K-Nearest Neighbors (KNN): Learning from Proximity
What it is
A simple, "lazy" supervised learning algorithm used for both classification and
regression.
How it works
Classifies new data points based on the majority class of its 'k' nearest neighbors.
The distance to neighbors is key.
Lazy Learning
It doesn't build a model during training; it only stores the dataset and computes
on demand for prediction.
Chapter 2
Unsupervised Learning: Discovering Hidden
Patterns
Principal Component
Analysis (PCA)
Reduces the number of features
(dimensions) while preserving
K-Means Clustering essential information.
Groups similar data points into 'k'
distinct clusters.
Anomaly Detection
Identifies rare or unusual data points
that deviate from the norm.
Unlike supervised learning, unsupervised learning works with unlabeled data to find inherent structures or relationships.
K-Means Clustering: Grouping Similar Data
What it is: An unsupervised learning algorithm that partitions data into 'K' distinct clusters, where 'K' is predefined.
01 02
Initialize centroids Assign points
Randomly place K initial centroids in the data space. Assign each data point to the nearest centroid.
03 04
Update centroids Repeat
Recalculate centroids as the mean of all points assigned to that cluster. Iterate assignment and update steps until centroids no longer move
significantly.
Real-world Use: Customer segmentation for marketing, image compression by grouping similar colors.
Dimensionality Reduction & Anomaly Detection
Principal Component Analysis (PCA) Anomaly Detection