[go: up one dir, main page]

0% found this document useful (0 votes)
12 views4 pages

Key Machine Learning Terminologies and Their Expla

This document is a study guide that defines key machine learning terminologies from 'Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow' (2nd Ed.), organized by themes such as supervised learning, unsupervised learning, model evaluation, training algorithms, support vector machines, decision trees, and ensemble methods. It includes explanations of concepts like regression, classification, overfitting, gradient descent, and various algorithms. Additionally, it features practice flashcards to reinforce understanding of the material.

Uploaded by

VenkateshKumar B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views4 pages

Key Machine Learning Terminologies and Their Expla

This document is a study guide that defines key machine learning terminologies from 'Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow' (2nd Ed.), organized by themes such as supervised learning, unsupervised learning, model evaluation, training algorithms, support vector machines, decision trees, and ensemble methods. It includes explanations of concepts like regression, classification, overfitting, gradient descent, and various algorithms. Additionally, it features practice flashcards to reinforce understanding of the material.

Uploaded by

VenkateshKumar B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Key Machine Learning Terminologies and Their

Explanations
This study guide extracts and defines the core terms introduced in “Hands-On Machine
Learning with Scikit-Learn, Keras, and TensorFlow” (2nd Ed.), organized by theme for clarity.

1. Supervised Learning
Training Set
A labeled dataset used to fit (train) a model; each example includes input features and the
correct output label.
Features (Attributes)
Measurable properties or characteristics of the data (e.g., petal length, income).
Labels (Targets)
The desired outputs in supervised learning (e.g., price, class).
Regression
Predicting a continuous quantity (e.g., house price, life satisfaction).
Classification
Predicting discrete categories (e.g., spam vs. ham, digit 0–9).

2. Unsupervised Learning
Clustering
Grouping similar instances without labels (e.g., K-Means, DBSCAN).
Dimensionality Reduction
Reducing the number of features while preserving structure (e.g., PCA, Kernel PCA, LLE).
Anomaly Detection
Identifying unusual instances that deviate from the norm (e.g., One-class SVM, Isolation
Forest).
Association Rule Learning
Discovering relationships between variables in large datasets (e.g., Apriori).

3. Model Evaluation
Overfitting
A model fits the training data too closely and fails to generalize to new data.
Underfitting
A model is too simple and cannot capture underlying patterns in the data.
Bias–Variance Trade-off
Balancing underfitting (high bias) vs. overfitting (high variance).
Cross-Validation
Partitioning data into folds to reliably estimate generalization error.
Holdout Test Set
A final subset of data (e.g., 20%) held aside to assess the model after training.
Precision / Recall
– Precision: TP/(TP + FP), the fraction of positive predictions that are correct.
– Recall: TP/(TP + FN), the fraction of actual positives correctly identified.
F₁ Score
The harmonic mean of precision and recall: .
Receiver Operating Characteristic (ROC) Curve
Plots True Positive Rate vs. False Positive Rate at various thresholds.
Area Under the ROC Curve (AUC)
A summary scalar of ROC performance; 1.0 is perfect, 0.5 is random.

4. Training Algorithms
Normal Equation
Direct closed-form solution for Linear Regression: .
Singular Value Decomposition (SVD)
Factorizes to compute a pseudoinverse for regression when is singular.
Gradient Descent (GD)
Iterative optimization: .
– Batch GD: uses all instances per step.
– Stochastic GD: uses one instance per step.
– Mini-batch GD: uses small batches per step.
Learning Rate (η)
Step size in gradient descent; too small slows convergence, too large may diverge.
Regularization
Constraining model parameters to reduce overfitting:
– Ridge (ℓ₂) Regression: adds .
– Lasso (ℓ₁) Regression: adds , promotes sparsity.
– Elastic Net: mix of ℓ₁ and ℓ₂ penalties.
Early Stopping
Ceasing training when validation error stops improving to prevent overfitting.

5. Support Vector Machines (SVM)


Hyperplane
Decision boundary separating classes; in -D, it is -D.
Margin
Distance between the hyperplane and the nearest training instances.
Support Vectors
Training instances that lie on the margin edges, which determine the hyperplane.
Hard Margin
Assumes perfect separability; no margin violations allowed.
Soft Margin
Allows some margin violations via slack variables and penalty .
Kernel Trick
Computing dot products in transformed feature spaces without explicit mapping; e.g.:
– Polynomial Kernel: .
– Gaussian RBF: .
Dual Problem
Reformulation expressing optimization in terms of instance weights , enabling kernels.

6. Decision Trees
Node Impurity
– Gini Impurity: .
– Entropy: .
CART Algorithm
Binary tree induction by minimizing weighted impurity of splits.
Max Depth / Min Samples Split / Min Samples Leaf
Hyperparameters controlling tree growth to avoid overfitting.

7. Ensemble Methods
Bagging (Bootstrap Aggregation)
Training predictors on bootstrap-sampled subsets and aggregating votes/predictions.
Random Forests
Bagged Decision Trees with feature subsampling at each split for extra diversity.
Extra-Trees
Like Random Forests but with random split thresholds as well.
Boosting
Sequentially training predictors to correct predecessors’ errors, e.g., AdaBoost, Gradient
Boosting.
Stacking
Training a meta-learner (blender) on base learners’ predictions to optimally combine them.

Practice Flashcards
1. What is the bias–variance trade-off?
Balancing model simplicity (high bias, underfitting) vs. model complexity (high variance,
overfitting).
2. How does ℓ₁ regularization differ from ℓ₂?
ℓ₁ promotes sparse weights (feature selection), ℓ₂ shrinks weights smoothly.
3. What does the kernel trick enable in SVMs?
Applying a linear algorithm in a high-dimensional feature space without explicitly mapping to
it.
4. When do you use early stopping?
To stop training once validation error plateaus or rises, preventing overfitting.
5. How do Random Forests reduce overfitting compared to a single tree?
Aggregating many decorrelated trees lowers variance.
6. Define precision and recall.
Precision = TP/(TP+FP); recall = TP/(TP+FN).
7. What is PCA’s objective?
Find orthogonal axes (principal components) that maximize data variance, then project onto
them.
8. Why scale features before distance-based algorithms?
To ensure all features contribute equally and prevent distortion.
9. What is a confusion matrix?
A table showing counts of TP, TN, FP, FN for binary classification.
10. Describe Bagging vs. Boosting.
Bagging trains models independently on random subsets; boosting trains sequentially,
focusing on previous errors.

You might also like