[go: up one dir, main page]

0% found this document useful (0 votes)
14 views3 pages

Chapter 6

The document outlines the process for developing an SMS spam filter, including dataset collection, preprocessing, feature engineering, model selection, training, evaluation, testing, and deployment. Key steps involve cleaning text data, converting it into numerical formats, selecting appropriate machine learning or deep learning models, and evaluating performance using metrics like accuracy and F1 score. The final goal is to deploy the model for real-world SMS classification.

Uploaded by

shalinigowda004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views3 pages

Chapter 6

The document outlines the process for developing an SMS spam filter, including dataset collection, preprocessing, feature engineering, model selection, training, evaluation, testing, and deployment. Key steps involve cleaning text data, converting it into numerical formats, selecting appropriate machine learning or deep learning models, and evaluating performance using metrics like accuracy and F1 score. The final goal is to deploy the model for real-world SMS classification.

Uploaded by

shalinigowda004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SMS spam filter 2024-25

Chapter 6
Testing
1. Dataset Collection - Obtain a dataset: Use an existing SMS dataset like the "SMS
Spam Collection Dataset" from UCI or Kaggle.

- Create a dataset: Collect SMS messages and label them as "spam" or "ham" (not spam).

2. Preprocessing

- Text cleaning: Remove unnecessary characters (punctuation, special symbols, etc.).

- Tokenization: Split messages into words or tokens.

- Lowercasing: Convert all text to lowercase for uniformity.


- Stopword removal: Remove common words that don’t add much meaning (e.g., "the",
"and").

- Stemming/Lemmatization: Reduce words to their root form.

3. Feature Engineering

- Convert text to numerical data:

- Bag of Words (BoW).

- TF-IDF (Term Frequency-Inverse Document Frequency).


- Word embeddings: Pre-trained embeddings like Word2Vec or GloVe, or embeddings from
transformer models (e.g., BERT).

4. Model Selection

- Use machine learning models like:

- Naive Bayes.

- Support Vector Machines (SVM).

- Logistic Regression.

- Random Forest.

- Or deep learning models:

- Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks.

- Transformer-based models (e.g., BERT, DistilBERT).

5. Train/Test Split

- Split the dataset into training and testing subsets (e.g., 80/20 split).

Department of CS&BS P a g e | 53
SMS spam filter 2024-25

6. Model Training

- Train the model using the training dataset.

7. Evaluation

- Use metrics like:

- Accuracy: Percentage of correct predictions.

- Precision: Ratio of correctly predicted spam messages to total predicted spam messages.

- Recall (Sensitivity): Ratio of correctly predicted spam messages to actual spam messages.

- F1 Score: Harmonic mean of precision and recall.

8. Testing

- Use the test dataset to evaluate the model's performance.

- Input example SMS texts to check the filter's accuracy.

9. Deployment

- Deploy the model in a real-world application to classify incoming SMS messages.

Department of CS&BS P a g e | 54
SMS spam filter 2024-25

Chapter 7

Result Analysis

Department of CS&BS P a g e | 55

You might also like