0% found this document useful (0 votes)

21 views4 pages

Email Classification Terminologies Explanations

The document provides an overview of several Python libraries and concepts for data analysis and natural language processing, including Pandas for data manipulation, regular expressions for string pattern matching, and NLTK for NLP tasks. It discusses the importance of stop words, the differences between stemming and lemmatization, and introduces the Bag of Words model for converting text into numerical features. Additionally, it explains the concepts of precision and recall in the context of evaluating predictive models.

Uploaded by

hemavasanth69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views4 pages

Email Classification Terminologies Explanations

Uploaded by

hemavasanth69

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1.

Pandas
Pandas is an open-source Python library used for data analysis and data manipulation. It
provides fast, flexible, and expressive data structures like:

• Series: One-dimensional labeled array.

• Data Frame: Two-dimensional labeled data structure (like a spreadsheet or SQL

table).

• Accessing Different Types of Datasets with Pandas

Format Function
CSV pd.read_csv('filename.csv')
Excel pd.read_excel('filename.xlsx')
JSON pd.read_json('filename.json')
SQL pd.read_sql(query, connection)
Parquet pd.read_parquet('filename.parquet')
HTML Tables pd.read_html('url_or_file.html')
Clipboard pd.read_clipboard()
Pickle pd.read_pickle('filename.pkl')

2.Regular expression library.

Common Functions in re

Function Description

re.search() Searches for a pattern anywhere in a string

re.match() Checks if the pattern matches at the beginning of the string

re.findall() Returns all non-overlapping matches as a list

re.sub() Replaces matches with a new string

re.split() Splits a string by the matches of the pattern

re.compile() Compiles a pattern for reuse (efficient in loops)

3.NLTK Library.
NLTK is a powerful Python package used for Natural Language Processing (NLP) tasks,
like:

• Tokenization

• Stemming

• Stopword removal

• Text classification

• Sentiment analysis

4. What are Stop words in NLP?

Stop words are the most common words in a language that usually do not carry
important meaning, especially for tasks like text classification, search engines, or NLP
models.

Why Remove Stop words?

Removing stop words helps to:

• Reduce noise in text data

• Decrease dimensionality of data

• Improve model performance in many NLP tasks

5.Stemming vs Lemmatization.
Stemming is the process of reducing a word to its word stem that affixes to suffixes and
prefixes or to the roots of words known as a lemma. Stemming is important in natural
language understanding (NLU) and natural language processing (NLP).

Lemmatization technique is like stemming. The output we will get after lemmatization is
called ‘lemma’, which is a root word rather than root stem, the output of stemming. After
lemmatization, we will be getting a valid word that means the same thing.

6.BAG OF WORDS.
What is Bag of Words (BoW)?

The Bag of Words model is a technique used to convert text into numerical features so
that it can be used by machine learning models.
• It counts the frequency of each word in a document.

• It ignores grammar and word order, and focuses only on word occurrences.

How BoW Works — Step-by-Step

Suppose you have these 3 sentences:

1. "I love playing football"

2. "Football is a great game"

3. "I do not like football"

Step 1: Build Vocabulary (Unique Words)

From all the sentences, list all unique words:

['i', 'love', 'playing', 'football', 'is', 'a', 'great', 'game', 'do', 'not', 'like']

(11 unique words)

Step 2: Vectorize Sentences

Now convert each sentence into a vector using word counts from the vocabulary.

Sentence BoW Vector (frequency of each word)

I love playing football [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]

Football is a great game [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0]

I do not like football [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1]

Summary

Feature BoW

Use Convert text to numbers

Based on Word frequency

Feature BoW

Ignores Grammar, word order

Output Vectors of word counts

7.Precision vs Recall.
Precision vs Recall (Intuition)

Term What it Answers Importance in Context

"Of all the messages predicted as Important when false positives are costly
Precision spam, how many were actually (e.g., mislabeling an important email as
spam?" spam)

"Of all the actual spam messages, Important when missing positives is risky
Recall how many did we correctly (e.g., missing a spam email that contains a
identify?" phishing link)

NLP Notes
No ratings yet
NLP Notes
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Text Representation: Lecture # 6
No ratings yet
Text Representation: Lecture # 6
21 pages
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
No ratings yet
Tokenization (Breaking Text Into Words) : Import From Import From Import From Import
11 pages
Stemming, Lemmatization & NLP Basics
No ratings yet
Stemming, Lemmatization & NLP Basics
6 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
20 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
Ai TXT Unit2
No ratings yet
Ai TXT Unit2
14 pages
NLB Final Lab Manual
No ratings yet
NLB Final Lab Manual
23 pages
Bag of Words
No ratings yet
Bag of Words
17 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
Aiml 1st Insem Vi Sem
No ratings yet
Aiml 1st Insem Vi Sem
11 pages
Chap 2
No ratings yet
Chap 2
70 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
NLP Asgn2
No ratings yet
NLP Asgn2
7 pages
Unit 6 Endsem PYQs
No ratings yet
Unit 6 Endsem PYQs
15 pages
Pipeline
No ratings yet
Pipeline
9 pages
NLP Techniques and Applications
No ratings yet
NLP Techniques and Applications
17 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
DSC 202
No ratings yet
DSC 202
8 pages
SL-3 - Assignment No 7
No ratings yet
SL-3 - Assignment No 7
14 pages
LP-VI - NLP - Lab Manual
No ratings yet
LP-VI - NLP - Lab Manual
21 pages
Unit 5
No ratings yet
Unit 5
8 pages
Web and Social Media Analytics Lab
No ratings yet
Web and Social Media Analytics Lab
34 pages
NLP Sem Answers (All)
No ratings yet
NLP Sem Answers (All)
124 pages
Unit2 01
No ratings yet
Unit2 01
9 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Python
No ratings yet
Python
5 pages
Statistical NLP
No ratings yet
Statistical NLP
45 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Embeddings
No ratings yet
Embeddings
3 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Basics of Bag of Words Model
No ratings yet
Basics of Bag of Words Model
32 pages
Text Analysis for Students
No ratings yet
Text Analysis for Students
11 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
16 pages
NLP Sem Questions and Answers
100% (1)
NLP Sem Questions and Answers
72 pages
Text Mining
No ratings yet
Text Mining
62 pages
Assignment
No ratings yet
Assignment
6 pages
Machine Learning For NLP: Vocabulary
No ratings yet
Machine Learning For NLP: Vocabulary
37 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
Text Mining Notes
No ratings yet
Text Mining Notes
28 pages
BBC Sports Text Preprocessing Guide
No ratings yet
BBC Sports Text Preprocessing Guide
6 pages
Bag of Words
No ratings yet
Bag of Words
32 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
Text Mining - Vectorization
No ratings yet
Text Mining - Vectorization
24 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Chapter 4 Text Classification
No ratings yet
Chapter 4 Text Classification
28 pages
Module III
No ratings yet
Module III
42 pages
Handling Corpus Raw Text
No ratings yet
Handling Corpus Raw Text
15 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
Unit IV
No ratings yet
Unit IV
57 pages
Traditional Word Embedding
No ratings yet
Traditional Word Embedding
9 pages
Topic 8
No ratings yet
Topic 8
55 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
Threats To Crypto Currency
No ratings yet
Threats To Crypto Currency
5 pages
II ND Unit NLP
No ratings yet
II ND Unit NLP
21 pages
Subqueries AD3391 Notes
No ratings yet
Subqueries AD3391 Notes
2 pages
Information Security 1
No ratings yet
Information Security 1
13 pages
III Cse Cs3351 Dpco QB Unit 3
No ratings yet
III Cse Cs3351 Dpco QB Unit 3
5 pages
Os 2 Marks New
No ratings yet
Os 2 Marks New
83 pages
Round 1 PS
No ratings yet
Round 1 PS
4 pages
Front End Project Description
No ratings yet
Front End Project Description
3 pages
Bca-1st Yr A Sec
No ratings yet
Bca-1st Yr A Sec
2 pages
Mediplant ID Presentation Complete (Autosaved)
No ratings yet
Mediplant ID Presentation Complete (Autosaved)
15 pages
Getting Started French
No ratings yet
Getting Started French
4 pages
Megha Raj Regmi: Senior Engineer CV
No ratings yet
Megha Raj Regmi: Senior Engineer CV
3 pages
Book 26 Jun 2025
No ratings yet
Book 26 Jun 2025
34 pages
Vivek 4 Students
No ratings yet
Vivek 4 Students
62 pages
Eligibility-Tsinghua Undergraduate Admissions
No ratings yet
Eligibility-Tsinghua Undergraduate Admissions
1 page
English Room - Proposal
No ratings yet
English Room - Proposal
6 pages
Carpathian Language Glossary
0% (1)
Carpathian Language Glossary
7 pages
Beginners Guide To Embedded C Programming Ebook
0% (2)
Beginners Guide To Embedded C Programming Ebook
3 pages
!3-ОШ СОЧ Англ.яз 8кл англ 161117
No ratings yet
!3-ОШ СОЧ Англ.яз 8кл англ 161117
60 pages
Getting Started On Ancient Greek
No ratings yet
Getting Started On Ancient Greek
342 pages
Waset Template
100% (1)
Waset Template
7 pages
Cambridge Skills For Effective Writing 2 Answer Key
100% (3)
Cambridge Skills For Effective Writing 2 Answer Key
26 pages
Our World 2E BrE Grammar Workbook 2
No ratings yet
Our World 2E BrE Grammar Workbook 2
24 pages
Grade 3 Writing Prompts Common Core
No ratings yet
Grade 3 Writing Prompts Common Core
3 pages
Applied Course
No ratings yet
Applied Course
12 pages
Instructional Materials Guide
86% (7)
Instructional Materials Guide
28 pages
Thk2e BrE Placement Test
100% (1)
Thk2e BrE Placement Test
7 pages
Honest Love: An Onion's Truth
No ratings yet
Honest Love: An Onion's Truth
15 pages
Teacher's Guide to Vocabulary Skills
No ratings yet
Teacher's Guide to Vocabulary Skills
3 pages
Spanish Plural Nouns: 4 Basic Rules
No ratings yet
Spanish Plural Nouns: 4 Basic Rules
4 pages
Vocabulary & Morphology Guide
No ratings yet
Vocabulary & Morphology Guide
25 pages
LD Typist KG
No ratings yet
LD Typist KG
3 pages
21 Useful English Phrases For Different Situations
No ratings yet
21 Useful English Phrases For Different Situations
6 pages
4th Quarter Examination
No ratings yet
4th Quarter Examination
3 pages
De Cuong On Tap TIENG ANH 6 Giua HK1 2022 2023
No ratings yet
De Cuong On Tap TIENG ANH 6 Giua HK1 2022 2023
11 pages
Grammar For Schools 3 Teachers Book-46
No ratings yet
Grammar For Schools 3 Teachers Book-46
1 page
Effective Affirmations for Success
No ratings yet
Effective Affirmations for Success
4 pages
Poetry Unit Plan
No ratings yet
Poetry Unit Plan
9 pages
Oral Test Course Outline Reviewer English 10
No ratings yet
Oral Test Course Outline Reviewer English 10
4 pages
Reported Speech - Questions
No ratings yet
Reported Speech - Questions
6 pages

Email Classification Terminologies Explanations

Uploaded by

Email Classification Terminologies Explanations

Uploaded by

1.

• Series: One-dimensional labeled array.

• Data Frame: Two-dimensional labeled data structure (like a spreadsheet or SQL

• Accessing Different Types of Datasets with Pandas

2.Regular expression library.

re.search() Searches for a pattern anywhere in a string

re.match() Checks if the pattern matches at the beginning of the string

re.findall() Returns all non-overlapping matches as a list

re.sub() Replaces matches with a new string

re.split() Splits a string by the matches of the pattern

re.compile() Compiles a pattern for reuse (efficient in loops)

4. What are Stop words in NLP?

Why Remove Stop words?

Removing stop words helps to:

• Reduce noise in text data

• Decrease dimensionality of data

• Improve model performance in many NLP tasks

How BoW Works — Step-by-Step

Suppose you have these 3 sentences:

1. "I love playing football"

2. "Football is a great game"

3. "I do not like football"

Step 1: Build Vocabulary (Unique Words)

From all the sentences, list all unique words:

(11 unique words)

Step 2: Vectorize Sentences

Sentence BoW Vector (frequency of each word)

I love playing football [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]

Football is a great game [0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0]

I do not like football [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1]

Use Convert text to numbers

Based on Word frequency

Ignores Grammar, word order

Output Vectors of word counts

Term What it Answers Importance in Context

You might also like