Model Evaluation Metrics:
1. MAE – Mean Absolute Error
Definition: The average of the absolute differences between predicted
and actual values.
Formula:
Interpretation: Measures average magnitude of the errors in a set of
predictions, without considering their direction.
Pros: Easy to interpret; less sensitive to outliers than MSE/RMSE.
Cons: Doesn’t penalize large errors as heavily as MSE or RMSE.
2. MSE – Mean Squared Error
Definition: The average of the squared differences between predicted
and actual values.
Formula:
Interpretation: Gives more weight to larger errors (squares them), so
it penalizes big mistakes more.
Pros: Good for when large errors are especially undesirable.
Cons: Sensitive to outliers; harder to interpret due to squared units.
3. RMSE – Root Mean Squared Error
Definition: The square root of the MSE.
Formula:
Interpretation: Similar to MSE but in the same units as the target
variable.
Pros: Easier to interpret than MSE; still penalizes large errors.
Cons: Still sensitive to outliers.
Summary Table:
Penalizes Large Same Units as Sensitive to
Metric
Errors Target Outliers
MAE No Yes Less
MSE Yes No (squared units) Yes
RMSE Yes Yes Yes
Model evaluation metrics: Accuracy, precision,
recall, F1-score, ROC-AUC.
1. Accuracy
Definition: The ratio of correctly predicted instances
to the total instances.
Formula:
Accuracy=TP+TN/TP+TN+FP+FN
Use case: Best used when classes are balanced.
Limitation: Misleading in imbalanced datasets.
2. Precision (Positive Predictive Value)
Definition: The ratio of correctly predicted positive
observations to the total predicted positives.
Formula:
Precision=TP/TP+FP
Use case: Useful when the cost of false positives is
high (e.g., spam detection).
3. Recall (Sensitivity or True Positive Rate)
Definition: The ratio of correctly predicted positive
observations to all actual positives.
Formula:
Recall=TP/TP+FN
Use case: Useful when the cost of false negatives is
high (e.g., disease diagnosis).
4. F1-Score
Definition: Harmonic mean of precision and recall.
Balances the two metrics.
Formula:
Use case: Best when you need a balance between
precision and recall.
5. ROC-AUC (Receiver Operating Characteristic – Area
Under Curve)
Definition: Measures the model’s ability to
distinguish between classes.
ROC Curve: Plots True Positive Rate (Recall) against
False Positive Rate.
AUC (Area Under Curve): AUC = 1 means perfect
classifier; AUC = 0.5 means random guessing.
Use case: Good for evaluating models across all
classification thresholds.
Summary Table:
Metric Focuses On Best When...
Overall
Accuracy Classes are balanced
correctness
Precision False positives False positives are costly
Recall False negatives False negatives are costly
Precision & Balance needed between FP
F1-Score
Recall and FN
ROC- Classification You care about ranking, not
AUC ability threshold
Model Training and Evaluation
1. Train-Test Split
This is the basic method for evaluating the performance of a
machine learning model.
You split your dataset into:
o Training set (e.g., 80%) – used to train the model.
o Test set (e.g., 20%) – used to evaluate model performance
on unseen data.
python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
2. Cross-Validation
Cross-validation improves reliability by splitting the data into
multiple folds.
The most common method: k-Fold Cross-Validation:
o The dataset is divided into k subsets (e.g., 5 or 10).
o The model is trained on k-1 folds and tested on the
remaining fold.
o Repeats k times, each time with a different test fold.
o The final performance metric is the average score across
all folds.
python
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(scores.mean())
3. Hyperparameter Tuning using GridSearchCV
Hyperparameters are parameters that are not learned during
training (e.g., number of trees in Random Forest).
GridSearchCV performs an exhaustive search over a grid of
hyperparameter values using cross-validation.
python
from sklearn.model_selection import GridSearchCV
param_grid = {
'n_estimators': [100, 200],
'max_depth': [None, 10, 20]
}
grid_search = GridSearchCV(estimator=RandomForestClassifier(),
param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
print(grid_search.best_params_)
print(grid_search.best_score_)
4. Overfitting and Underfitting
Term Description Symptoms Solution
Model learns the
High accuracy Regularization, more
training data too
Overfitting on training, data, simpler model,
well, including
poor on test. cross-validation.
noise.
Poor Use a more complex
Model is too
performance on model, feature
Underfitting simple to capture
both training engineering, reduce
patterns.
and test. regularization.
Visual Example:
Underfitting: Straight line on a curved pattern.
Overfitting: Complex zig-zag curve trying to fit every point.
Good fit: Smooth curve that generalizes well.