Basic Level
Q: What is supervised learning?
A: Supervised learning is a type of machine learning where the model learns from labeled data
(input and correct output).
Q: What are some examples of supervised learning applications?
A: Email spam detection, fraud detection, medical diagnosis, and credit scoring.
Q: What is the difference between supervised and unsupervised learning?
A: Supervised uses labeled data to predict outcomes; unsupervised finds patterns in unlabeled data.
Q: What types of problems can be solved using supervised learning?
A: Classification (e.g., spam or not) and regression (e.g., predicting house prices).
Q: What are the two main types of supervised learning tasks?
A: Classification and regression.
Intermediate Level
Q: Name some algorithms used in supervised learning.
A: Linear Regression, Logistic Regression, Decision Trees, Random Forest, KNN, SVM.
Q: What is the difference between classification and regression?
A: Classification predicts categories; regression predicts continuous values.
Q: What is overfitting in supervised learning?
A: Overfitting is when a model learns the noise of training data instead of the pattern, performing
poorly on new data.
Q: How can you avoid overfitting?
A: By using cross-validation, regularization, pruning (in trees), early stopping, and gathering more
data.
Q: What is cross-validation?
A: Cross-validation is a technique to split data into training and validation sets multiple times to
ensure the model generalizes well.
Advanced Level
Q: What is the bias-variance tradeoff?
A: It is the balance between a model's complexity and its ability to generalize. High bias underfits,
high variance overfits.
Q: What is ROC curve and AUC score?
A: ROC curve shows the trade-off between sensitivity and specificity; AUC measures the area under
this curve.
Q: Explain bagging and boosting.
A: Bagging combines predictions from multiple models to reduce variance; boosting combines weak
models to create a strong model.
Q: What is regularization (L1 and L2)?
A: Regularization adds a penalty to the loss function to prevent overfitting (L1: Lasso, L2: Ridge).
Q: How do you handle imbalanced datasets?
A: Use techniques like oversampling the minority class, undersampling the majority class, or using
algorithms that adjust for imbalance.