-
Notifications
You must be signed in to change notification settings - Fork 2
SUPPORT_VECTOR_MACHINES
TYehan edited this page Feb 27, 2025
·
1 revision
This document explains the primary machine learning concepts demonstrated in the Support Vector Machines (SVM) practical notebook. The notebook uses the breast cancer dataset to build an SVM classifier, visualize decision boundaries, and evaluate the model.
-
Dataset Acquisition:
- The breast cancer dataset is loaded using
load_breast_cancer
fromsklearn.datasets
.
- The breast cancer dataset is loaded using
-
Dataset Inspection:
- The feature names and target values are printed to verify and inspect the structure of the dataset.
-
Feature Selection:
- A subset of features (the first two features) is selected for training and visualization, simplifying the decision boundary display.
-
SVM Classifier:
- An SVM classifier (
SVC
) with an RBF kernel is initialized with parametersgamma=0.5
andC=1.0
.
- An SVM classifier (
-
Model Training:
- The SVM model is fitted on the selected features and corresponding target values from the breast cancer dataset.
-
Boundary Plotting:
- The
DecisionBoundaryDisplay.from_estimator
function is used to generate and visualize the decision boundary of the trained SVM.
- The
-
Scatter Plot Overlay:
- A scatter plot overlays the decision boundary, displaying data points colored by their target class.
-
Plot Customization:
- Both subplots include axis labels, titles, and custom color mapping (using
plt.cm.Spectral
) to enhance interpretability.
- Both subplots include axis labels, titles, and custom color mapping (using
-
Alternate Visualization:
- An additional visualization approach is provided in which the decision boundary and data points are shown in a single plot.
-
Model Interpretation:
- These visualizations help assess how well the SVM classifier separates the two classes in the feature space.
This practical notebook serves as a comprehensive example of applying Support Vector Machines for classification tasks, demonstrating effective data preprocessing, model training, and decision boundary visualization on a real-world dataset.