100% found this document useful (1 vote)

9K views20 pages

Natural Language Processing Notes Class 10 AI

The document discusses natural language processing (NLP). NLP allows computers to understand human language text and speech. It describes some common NLP applications like automatic summarization, sentiment analysis, text classification, and virtual assistants. It also discusses chatbots and the differences between script bots and smart bots. The document then compares human language to computer language, noting challenges computers face in understanding things like syntax, semantics, multiple word meanings, and having perfect syntax with no meaning. It concludes by outlining some initial steps in processing human language text for computers, including text normalization.

Uploaded by

madhansrivallab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

9K views20 pages

Natural Language Processing Notes Class 10 AI

Uploaded by

madhansrivallab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

AI Simplified by Aiforkids.

NLP Class
Project 10
Cycle
AI Notes
NATURAL LANGUAGE
PROCESSING
Process to simplify human lang.
The ability of a computer to Start
to make it understandable.
understand text and spoken words
NLP Process
Data Processing
Ex. Mitsuku Bot, Clever Bot,
Jabberwacky, and Haptik. What Text Normalisation
Sentence Segmentation
Tokenisation
Chat Bots Removal of Stop word
Converting into same case
Smart Bot
Stemming and Lemmatization
Script Bot Applications
of NLP Bag of word Algorithm
Automatic Summarization Why
TFIDF
Sentiment Analysis
Term Frequency
Text classification Inverse Document Frequency
Virtual Assistants Applications of TFIDF
Problems in Understanding
human languages by computers.
Human Language CLICK TEXT TO OPEN THE LINK

Computer Language
Human Download Revision Notes Pdf
Arrangement of words & meanings VS
Computer Solve Important Questions
(Structure) Syntax
(Meaning) Semantics Practice VIP Questions PDF
Multiple Meanings of a word Practice Sample Papers
Perfect Syntax, no Meaning Ask and Solve Doubts at
Aiforkids Doubts corner
Practice NLP Explanation Video

Youtube.com/aiforkids Aiforkids.in/class-10/nlp

Learning is not a course, Its a path from passion to profession

" ~Lalit Kumar
"
1

What is NLP?

Natural Language Processing (NLP) is the sub-field of AI that focuses on the

ability of a computer to understand human language (command) as spoken or
written and to give an output by processing it.

Applications of NLP → Feel Deta h, NLP KA

Automatic Summarization
Summarizing the meaning of documents and information
Extract the key emotional information from the text to understand the reactions
(Social Media)

Sentiment Analysis

Identify sentiments and emotions from one or more posts

Companies use it to identify opinions and sentiments to get feedback
Can be Positive, Negative or Neutral

Text classification

Assign predefined categories to a document and organize it to help you find the
information you need or simplify some activities.
Eg: Spam filtering in email.
2

Virtual Assistants
By accessing our data, they can help us in keeping notes of our tasks, making
calls for us, sending messages, and a lot more.
With speech recognition, these assistants can not only detect our speech but
can also make sense of it.
A lot more advancements are expected in this field in the near future
Eg: Google Assistant, Cortana, Siri, Alexa, etc

Revising AI Project Cycle

Project Cycle is a step-by-step process to solve problems using proven

scientific methods and drawing inferences about them.

1 Components of Project Cycle

Problem Scoping - Understanding the problem

Data Acquisition - Collecting accurate and reliable data
Data Exploration - Arranging the data uniformly
Modelling - Creating Models from the data
Evaluation - Evaluating the project

The Stakeholder Who

[Problem
Have a problem Issue/Problem What
Statement
When/While Context/Situation/Location Where Template]
Ideal Solution How the Solution will help Stakeholders Why
3

ChatBots

One of the most common applications of Natural Language Processing is a chatbot.

Example: Mitsuku Bot Jabberwacky Rose

Clever Bot Haptic OChatbot

Types of ChatBots

Script bots Smart bots

Easy to make Comparatively difficult to make

Work on the script of the programmed set. Work on bigger databases

Limited functionality Wide functionality

No or little language processing skills Coding is required

Example: Customer Care Bots. Example: Google Assistant, Alexa,

Cortana, Siri, etc.
4

Human Language VS Computer Language

1 Human Language

Humans communicate through language which we process all the time.

Our brain keeps on processing the sounds that it hears around itself and
tries to make sense out of them all the time.
Communications made by humans are complex.
2 Computer Language

The computer understands the language of numbers.

Everything that is sent to the machine has to be converted to numbers.
A single mistake is made, the computer throws an error and does not
process that part.
The communications made by the machines are very basic and simple

Errors in Processing Human language

Arrangement of words and meaning

Different Syntax, Same meaning Our Brain
Different Meaning, Same Syntax
Listen Prioritize
Multiple Meanings of the Word
Perfect Syntax, No Meaning
Process
5

Arrangement of the words and meaning

Syntax: Syntax refers to the grammatical structure of a sentence.

Semantics: It refers to the meaning of the sentence.

Different syntax, same semantics: 2+3 = 3+2

Here the way these statements are written is different, but their meanings
are the same that is 5.
Different semantics, same syntax: 3/2 (Python 2.7) ≠ 3/2 (Python 3).

Here we have the same syntax but their meanings are different. In Python
2.7, this statement would result in 1 while in Python 3, it would give an
output of 1.5.

1 Multiple Meanings of a word

To understand let us have an example of the following three sentences:

1. "His face turned red after he found out that he had taken the wrong bag"
Possibilities: He feels ashamed because he took another person’s bag instead
of his OR he's feeling angry because he did not manage to steal the bag that he
has been targeting.
2. "The red car zoomed past his nose"
Possibilities: Probably talking about the color of the car, that traveled close to
him in a flash.
6

3. "His face turns red after consuming the medicine"

Possibilities: Is he having an allergic reaction? Or is he not able to bear the taste of
that medicine?

2 Perfect Syntax, no Meaning

1. "Chickens feed extravagantly while the moon drinks tea"

Meaning: This statement is correct grammatically but makes no sense. In
Human language, a perfect balance of syntax and semantics is important for
better understanding.

Data Processing

Since we all know that the language of computers is Numerical, the very first
step that comes to our mind is to convert our language to numbers.
This conversion takes a few steps to happen. The first step to it is Text
Normalisation.

Text Normalisation

In Text Normalization, we undergo several steps to normalize the text to a lower

level. That is, we will be working on text from multiple documents and the term
used for the whole textual data from all the documents altogether is known as
"Corpus".
7

1 Sentence Segmentation
Under sentence segmentation, the whole corpus is divided into sentences. Each
sentence is taken as a different data so now the whole corpus gets reduced to
sentences.

Example:

Before Sentence Segmentation After Sentence Segmentation

“You want to see the dreams with close You want to see the dreams with close eyes
eyes and achieve them? They’ll remain and achieve them?
dreams, look for AIMs and your eyes They’ll remain dreams, look for AIMs and
have to stay open for a change to be your eyes have to stay open for a change

seen.” to be seen

2 Tokenisation
Tokenisation

A “Token” is a term used for any word or number or special character occurring in a
sentence.

Under Tokenisation, every word, number, and special character is considered

separately and each of them is now a separate token.
8

Corpus: A corpus can be defined as a collection of text documents.

Example: You want to see the dreams with close eyes and achieve them?

You want to see the dreams with close

eyes and acheive them ?

4 Removal of Stopwords

Stopwords: Stopwords are the words that occur very frequently in the
corpus but do not add any value to it.

Examples: a, an, and, are, as, for, it, is, into, in, if, on, or, such, the, there, to.

In this step, the tokens which are not necessary are removed from the token
list. To make it easier for the computer to focus on meaningful terms, these
words are removed.

Along with these words, a lot of times our corpus might have special
characters and/or numbers.

if you are working on a document containing email IDs, then you might not want
to remove the special characters and numbers
9

Example: You want to see the dreams with close eyes and achieve them?

the removed words would be

to, the, and, ?

The outcome would be:

-> You want see dreams with close eyes achieve them

5 Converting text to a common case

We convert the whole text into a similar case, preferably lower case. This
ensures that the case sensitivity of the machine does not consider the same
words as different just because of different cases.

6 Stemming

Stemming is a technique used to extract the base form of the words by

removing affixes from them. It is just like cutting down the branches of a tree to
its stems.

Might not be meaningful.

Example:

Words Affixes Stem

healing ing heal

dreams s dream

studies es studi

7 Lemmatization

In lemmatization, the word we get after affix removal (also known as

lemma) is a meaningful one and it takes a longer time to execute than
stemming.

Lemmatization makes sure that a lemma is a word with meaning

Example:

Words Affixes lemma

healing ing heal

dreams s dream

studies es study
11

Difference between stemming and lemmatization

Stemming lemmatization

The stemmed words might not The lemma word is a meaningful

be meaningful. one.
Caring ➔ Car Caring ➔ Care

Bag of word Algorithm

Bag of Words just creates a set of vectors containing the count of word
occurrences in the document (reviews). Bag of Words vectors is easy to
interpret.

The bag of words gives us two things:

A vocabulary of words for the corpus
The frequency of these words (number of times it has occurred in the
whole corpus).

Here calling this algorithm a “bag” of words symbolizes that the sequence of
sentences or tokens does not matter in this case as all we need are the unique
words and their frequency in it.
12

Steps of the bag of words algorithm

1. Text Normalisation: Collecting data and pre-processing it

2. Create Dictionary: Making a list of all the unique words occurring in
the corpus. (Vocabulary)
3. Create document vectors: For each document in the corpus, find out
how many times the word from the unique list of words has occurred.
4. Create document vectors for all the documents.

Example:
Step 1: Collecting data and pre-processing it.

Raw Data Processed Data

Document 1: Aman and Anil are Document 1: [aman, and, anil, are,
stressed stressed ]
Document 2: Aman went to a Document 2: [aman, went, to, a,
therapist therapist]
Document 3: Anil went to Document 3: [anil, went, to,
download a health chatbot download, a, health, chatbot]

Step 2: Create Dictionary

Dictionary in NLP means a list of all the unique words occurring in
the corpus. If some words are repeated in different documents, they
are all written just once while creating the dictionary.
13

aman and anil are stressed went

download health chatbot therapist a to

Some words are repeated in different documents, they are all

written just once, while creating the dictionary, we create a list
of unique words.

Step 3: Create a document vector

The document Vector contains the frequency of each word of the
vocabulary in a particular document.

In the document, vector vocabulary is written in the top row.

Now, for each word in the document, if it matches the
vocabulary, put a 1 under it.
If the same word appears again, increment the
previous value by 1.
And if the word does not occur in that document, put a
0 under it.

aman and anil are stressed went to a therapist download health chatbot

1 1 1 1 1 0 0 0 0 0 0 0
14

Step 4: Creating a document vector table for all documents

aman and anil are stressed went to a therapist download health chatbot

1 1 1 1 1 0 0 0 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 0
0 0 1 0 0 1 1 1 0 1 1 1

In this table, the header row contains the vocabulary of the corpus and
three rows correspond to three different documents.
Finally, this gives us the document vector table for our corpus. But the
tokens have still not converted to numbers. This leads us to the final
steps of our algorithm: TFIDF.

TFIDF
TFIDF stands for Term Frequency & Inverse Document Frequency.

1 Term Frequency

1. Term frequency is the frequency of a word in one document.

2. Term frequency can easily be found in the document vector table

Example:
15

aman and anil are stressed went to a therapist download health chatbot

1 1 1 1 1 0 0 0 0 0 0 0
1 0 0 0 0 1 1 1 1 0 0 0
0 0 1 0 0 1 1 1 0 1 1 1

Here, as we can see that the frequency of each word for each document
has been recorded in the table. These numbers are nothing but the Term
Frequencies!

2 Document Frequency

Document Frequency is the number of documents in which the word occurs

irrespective of how many times it has occurred in those documents.

aman and anil are stressed went to a therapist download health chatbot

2 1 2 1 1 2 2 2 1 1 1 1

We can observe from the table is:

1. Document frequency of ‘aman’, ‘anil’, ‘went’, ‘to’ and ‘a’ is 2 as they
have occurred in two documents.
2. Rest of them occurred in just one document hence the document
frequency for them is one.
16

3 Inverse Document Frequency

In the case of inverse document frequency, we need to put the

document frequency in the denominator while the
total number of documents is the numerator.

aman and anil are stressed went to a therapist download health chatbot

3/2 3/1 3/2 3/1 3/1 3/2 3/2 3/2 3/1 3/1 3/1 3/1

Formula of TFIDF

The formula of TFIDF for any word W becomes:

TFIDF(W) = TF(W) * log( IDF(W) )

We don’t need to calculate the log values

by ourselves. We simply have to use the
log function in the calculator and find out!

aman and anil are stressed went to a therapist download health chatbot

1log 0log 0log 0log

1*log(3/2) 1*log(3) 1*log(3) 1*log(3) 0*log(3) 0*log(3) 0*log(3) 0*log(3)
(3/2) (3/2) (3/2) (3/2)

0log 1log 1log 1log

1*log(3/2) 0*log(3) 0*log(3) 0*log(3) 1*log(3) 0*log(3) 0*log(3) 0*log(3)
(3/2) (3/2) (3/2) (3/2)

1log 1log 1log 1log

0*log(3/2) 0*log(3) 0*log(3) 0*log(3) 0*log(3) 1*log(3) 1*log(3) 1*log(3)
(3/2) (3/2) (3/2) (3/2)
17

After calculating all the values, we get:

aman and anil are stressed went to a therapist download health chatbot

0.176 .477 0.176 0.477 0.477 0 0 0 0 0 0 0

0.176 0 0 0 0 0.176 0.176 0.176 0.477 0 0 0

0 0 0.176 0 0 0.176 0.176 0.176 0 0.477 0.477 0.477

Finally, the words have been converted to numbers. These numbers are
the values of each document.

Here, we can see that since we have less amount of data, words like ‘are’
and ‘and’ also have a high value. But as the IDF value increases, the value
of that word decreases.

That is, for example:

Total Number of documents: 10
Number of documents in which ‘and’ occurs: 10
Therefore, IDF(and) = 10/10 = 1

Which means: log(1) = 0. Hence, the value of ‘and’ becomes 0.

On the other hand, the number of documents in which ‘pollution’ occurs: 3
IDF(pollution) = 10/3 = 3.3333…
This means log(3.3333) = 0.522; which shows that the word ‘pollution’
has considerable value in the corpus.
18

Important concepts to remember:

1. Words that occur in all the documents with high term frequencies have the
least values and are considered to be the stopwords
2. For a word to have a high TFIDF value, the word needs to have a high term
frequency but less document frequency which shows that the word is
important for one document but is not a common word for all documents.
3. These values help the computer understand which words are to be
considered while processing the natural language. The higher the value, the
more important the word is for a given corpus.

Applications of TFIDF

TFIDF is commonly used in the Natural Language Processing domain.

Some of its applications are:

1. Document Classification – Helps in classifying the type and genre of

a document.
2. Topic Modelling – It helps in predicting the topic for a corpus.
3. Information Retrieval System – To extract the important information
out of a corpus.
4. Stop word filtering – Helps in removing the unnecessary words from
a text body.

Computer Vision Class 10 Notes
100% (8)
Computer Vision Class 10 Notes
7 pages
Evaluation Class X
40% (5)
Evaluation Class X
19 pages
10th Part B-Unit 2 AI Project Cycle-1
50% (6)
10th Part B-Unit 2 AI Project Cycle-1
17 pages
Revisiting AI Project Cycle & Ethical Frameworks For All
100% (1)
Revisiting AI Project Cycle & Ethical Frameworks For All
61 pages
CLASS X U1 QAs
100% (4)
CLASS X U1 QAs
7 pages
Unit AI Prject Cycle Notes
80% (5)
Unit AI Prject Cycle Notes
5 pages
Advanced Concept of Modelling in AI
100% (1)
Advanced Concept of Modelling in AI
32 pages
Evaluation-Practice Questions (Answer Key)
100% (2)
Evaluation-Practice Questions (Answer Key)
4 pages
Ai Project Cycle Class X
No ratings yet
Ai Project Cycle Class X
23 pages
Ling in AI Class 10 Questions and Answers
No ratings yet
Ling in AI Class 10 Questions and Answers
17 pages
CHAPTER-2 Advance Concepts of Modeling in AI Class 10 Questions and Answers
No ratings yet
CHAPTER-2 Advance Concepts of Modeling in AI Class 10 Questions and Answers
13 pages
UNIT 3 Evaluating Models Q-Ans
100% (2)
UNIT 3 Evaluating Models Q-Ans
6 pages
Class 10 Unit 1 Notes
100% (1)
Class 10 Unit 1 Notes
7 pages
Ai Practical XTH Class
No ratings yet
Ai Practical XTH Class
21 pages
Unit 2 Advanced Concepts of Modeling in AI
83% (6)
Unit 2 Advanced Concepts of Modeling in AI
5 pages
Class 10 Ai Sample Paper - 1
100% (1)
Class 10 Ai Sample Paper - 1
5 pages
Chapter 1 Revisiting Ai Project Cycle and Ethical Farmework
100% (4)
Chapter 1 Revisiting Ai Project Cycle and Ethical Farmework
6 pages
Class 9 Data Literacy Notes
50% (2)
Class 9 Data Literacy Notes
7 pages
AI Class 10 Notes: Key Concepts
67% (12)
AI Class 10 Notes: Key Concepts
18 pages
NLP Worksheet: Text Processing, Bag of Words and TF-IDF
100% (2)
NLP Worksheet: Text Processing, Bag of Words and TF-IDF
10 pages
Class Ix 417
100% (1)
Class Ix 417
3 pages
Green Skills Class 10 Notes
80% (5)
Green Skills Class 10 Notes
5 pages
Ai Project Cycle
100% (1)
Ai Project Cycle
29 pages
Answer Key Sample Paper 1 AI Class 10 Tutorialaicsip
No ratings yet
Answer Key Sample Paper 1 AI Class 10 Tutorialaicsip
11 pages
ICT Skills II Notes
100% (4)
ICT Skills II Notes
5 pages
Important QnA Neural Network AI Class 10
100% (1)
Important QnA Neural Network AI Class 10
3 pages
Computer Application Class X Compiled Notes
100% (2)
Computer Application Class X Compiled Notes
54 pages
Unit 2 Ncert Questiom & Answer
No ratings yet
Unit 2 Ncert Questiom & Answer
14 pages
Entrepreneurial Skills Class 10 Notes
33% (3)
Entrepreneurial Skills Class 10 Notes
4 pages
Artificial Intelligence 417 Class X Sample Paper Test 01 For Board Exam 2024
100% (1)
Artificial Intelligence 417 Class X Sample Paper Test 01 For Board Exam 2024
6 pages
CBSE Class 10 Artificial Intelligence Solution Set 4 104
100% (4)
CBSE Class 10 Artificial Intelligence Solution Set 4 104
17 pages
Class 9 AI Artificial Neural Networks
No ratings yet
Class 9 AI Artificial Neural Networks
5 pages
Data Sciences Class 10 Notes
100% (2)
Data Sciences Class 10 Notes
3 pages
Green Skills Class 10
100% (3)
Green Skills Class 10
4 pages
Answer Key Sample Paper 3 AI Class 10
100% (4)
Answer Key Sample Paper 3 AI Class 10
10 pages
Class 10 AI NOTES CHAPTER 1
86% (7)
Class 10 AI NOTES CHAPTER 1
7 pages
AI PROJECT CYCLE-1 Class 9
100% (1)
AI PROJECT CYCLE-1 Class 9
7 pages
Class 10 Unit 1
No ratings yet
Class 10 Unit 1
6 pages
AI-417-IX Unit 3 Math For AI
33% (3)
AI-417-IX Unit 3 Math For AI
18 pages
Class IX - Chapter 2 AI Project Cycle Notes
50% (6)
Class IX - Chapter 2 AI Project Cycle Notes
11 pages
Class 10 Ai Sample Paper MS 23-24
50% (2)
Class 10 Ai Sample Paper MS 23-24
8 pages
Class X - Artificial Intelligence - Evaluation - Question Bank
86% (7)
Class X - Artificial Intelligence - Evaluation - Question Bank
8 pages
Class 10 Part B Unit 1 Introduction To AI
100% (2)
Class 10 Part B Unit 1 Introduction To AI
16 pages
Data Literacy Q - Ans
100% (4)
Data Literacy Q - Ans
3 pages
Artificial Intelligence: Three Domains of AI
No ratings yet
Artificial Intelligence: Three Domains of AI
4 pages
Orange - AI417 - 10 - QP (P1) AI Class 10 Sample
100% (2)
Orange - AI417 - 10 - QP (P1) AI Class 10 Sample
7 pages
AI Notes (Prashant Kirad) Class 10
75% (12)
AI Notes (Prashant Kirad) Class 10
75 pages
Answer Key Sample Paper 2 AI Class 10
No ratings yet
Answer Key Sample Paper 2 AI Class 10
10 pages
Notes-Problem Scoping
No ratings yet
Notes-Problem Scoping
7 pages
AI - Book 10 - Part B - Answer Key (New Version)
No ratings yet
AI - Book 10 - Part B - Answer Key (New Version)
26 pages
AI NOTES Class 10 Term 1
No ratings yet
AI NOTES Class 10 Term 1
33 pages
Class-X Q - A (Part-B Unit-I) - 1
100% (2)
Class-X Q - A (Part-B Unit-I) - 1
5 pages
Class X Employability Skill Notes
No ratings yet
Class X Employability Skill Notes
12 pages
Case Based Questions
100% (1)
Case Based Questions
4 pages
Project File Class X Ai
75% (8)
Project File Class X Ai
17 pages
AI Class10 Unit2 Modeling Notes 2025-26
0% (2)
AI Class10 Unit2 Modeling Notes 2025-26
2 pages
Chapter 2 MCQs ClassXAI
100% (1)
Chapter 2 MCQs ClassXAI
25 pages
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
No ratings yet
Natural Language Processing: Learning Is Not A Course, Its A Path From Passion To Profession
19 pages
Natural Language Processing Notes Class 10 AI
No ratings yet
Natural Language Processing Notes Class 10 AI
24 pages
Natural Language Processing Notes Class 10 AI
No ratings yet
Natural Language Processing Notes Class 10 AI
25 pages
Giving Directions: Vocabulary
100% (1)
Giving Directions: Vocabulary
45 pages
GE LP LP33 Series 60 To 120kVA Floor Standing Datasheet
No ratings yet
GE LP LP33 Series 60 To 120kVA Floor Standing Datasheet
6 pages
Techm Apti - 2.Pps
No ratings yet
Techm Apti - 2.Pps
45 pages
TOEIC Reading Level 2 Course Book
No ratings yet
TOEIC Reading Level 2 Course Book
94 pages
Jazz Rhythm Precision Exercises
50% (2)
Jazz Rhythm Precision Exercises
12 pages
High School Speech Analysis
No ratings yet
High School Speech Analysis
7 pages
Q1/ Grammatical Structure:: 10 Marks)
No ratings yet
Q1/ Grammatical Structure:: 10 Marks)
2 pages
Programming in C-Internal 1 Question Paper - Kiruthika
No ratings yet
Programming in C-Internal 1 Question Paper - Kiruthika
2 pages
Canoy
No ratings yet
Canoy
6 pages
Lesson Plan For Patrick Henry Speech To The Virginia Convention
No ratings yet
Lesson Plan For Patrick Henry Speech To The Virginia Convention
3 pages
02 - Unit Six - Exercises KEY
No ratings yet
02 - Unit Six - Exercises KEY
2 pages
Vlalnh 'KCN: Unparliamentary Expressions
No ratings yet
Vlalnh 'KCN: Unparliamentary Expressions
50 pages
Leizig Glossing Rules
No ratings yet
Leizig Glossing Rules
10 pages
PREPARE 1 Vocabulary Standard REPASO
No ratings yet
PREPARE 1 Vocabulary Standard REPASO
6 pages
Common French Socializing Questions
No ratings yet
Common French Socializing Questions
5 pages
Chart of Passive Voice
No ratings yet
Chart of Passive Voice
2 pages
AI Unit-4 Software Agents Communication
No ratings yet
AI Unit-4 Software Agents Communication
9 pages
Contoh ROSTER PBM SEMESTER GANJIL TP 2023-2024
No ratings yet
Contoh ROSTER PBM SEMESTER GANJIL TP 2023-2024
3 pages
رعش Hair Sha'ar: My Small Body lesson
No ratings yet
رعش Hair Sha'ar: My Small Body lesson
5 pages
Adult Dyslexia Treatment Program
No ratings yet
Adult Dyslexia Treatment Program
2 pages
Cognitive Linguistics 2nd Exam
No ratings yet
Cognitive Linguistics 2nd Exam
16 pages
What Is Language
No ratings yet
What Is Language
4 pages
IELTS TASK 1 Describing Process
No ratings yet
IELTS TASK 1 Describing Process
10 pages
Common Core Lesson Plan
No ratings yet
Common Core Lesson Plan
2 pages
GRAMMAR - Embedded Questions
No ratings yet
GRAMMAR - Embedded Questions
5 pages
Sol3e Adv Cumulative Test 1-5 A
100% (1)
Sol3e Adv Cumulative Test 1-5 A
6 pages
English Exam Prep for Grade 6
No ratings yet
English Exam Prep for Grade 6
7 pages
Improving Speaking Skill Through Storytelling
No ratings yet
Improving Speaking Skill Through Storytelling
18 pages
Demand High Activities From Jim Scrivener's Talk at IATEFL Liverpool 2013
No ratings yet
Demand High Activities From Jim Scrivener's Talk at IATEFL Liverpool 2013
4 pages
L24 What Animals Eat D
No ratings yet
L24 What Animals Eat D
8 pages

Natural Language Processing Notes Class 10 AI

Uploaded by

Natural Language Processing Notes Class 10 AI

Uploaded by

AI Simplified by Aiforkids.

Learning is not a course, Its a path from passion to profession

Natural Language Processing (NLP) is the sub-field of AI that focuses on the

Applications of NLP → Feel Deta h, NLP KA

Identify sentiments and emotions from one or more posts

Revising AI Project Cycle

Project Cycle is a step-by-step process to solve problems using proven

1 Components of Project Cycle

Problem Scoping - Understanding the problem

The Stakeholder Who

One of the most common applications of Natural Language Processing is a chatbot.

Example: Mitsuku Bot Jabberwacky Rose

Script bots Smart bots

Easy to make Comparatively difficult to make

Work on the script of the programmed set. Work on bigger databases

Limited functionality Wide functionality

No or little language processing skills Coding is required

Example: Customer Care Bots. Example: Google Assistant, Alexa,

Human Language VS Computer Language

Humans communicate through language which we process all the time.

The computer understands the language of numbers.

Errors in Processing Human language

Arrangement of words and meaning

Arrangement of the words and meaning

Syntax: Syntax refers to the grammatical structure of a sentence.

Semantics: It refers to the meaning of the sentence.

Different syntax, same semantics: 2+3 = 3+2

1 Multiple Meanings of a word

To understand let us have an example of the following three sentences:

3. "His face turns red after consuming the medicine"

2 Perfect Syntax, no Meaning

1. "Chickens feed extravagantly while the moon drinks tea"

In Text Normalization, we undergo several steps to normalize the text to a lower

Before Sentence Segmentation After Sentence Segmentation

Under Tokenisation, every word, number, and special character is considered

Corpus: A corpus can be defined as a collection of text documents.

You want to see the dreams with close

eyes and acheive them ?

the removed words would be

The outcome would be:

5 Converting text to a common case

Stemming is a technique used to extract the base form of the words by

Might not be meaningful.

Words Affixes Stem

healing ing heal

In lemmatization, the word we get after affix removal (also known as

Lemmatization makes sure that a lemma is a word with meaning

Words Affixes lemma

healing ing heal

Difference between stemming and lemmatization

The stemmed words might not The lemma word is a meaningful

Bag of word Algorithm

The bag of words gives us two things:

Steps of the bag of words algorithm

1. Text Normalisation: Collecting data and pre-processing it

Raw Data Processed Data

Step 2: Create Dictionary

aman and anil are stressed went

download health chatbot therapist a to

Some words are repeated in different documents, they are all

Step 3: Create a document vector

In the document, vector vocabulary is written in the top row.

Step 4: Creating a document vector table for all documents

1. Term frequency is the frequency of a word in one document.

Document Frequency is the number of documents in which the word occurs

We can observe from the table is:

3 Inverse Document Frequency

In the case of inverse document frequency, we need to put the

The formula of TFIDF for any word W becomes:

We don’t need to calculate the log values

1*log 0*log 0*log 0*log

0*log 1*log 1*log 1*log

1*log 1*log 1*log 1*log

After calculating all the values, we get:

0.176 .477 0.176 0.477 0.477 0 0 0 0 0 0 0

0.176 0 0 0 0 0.176 0.176 0.176 0.477 0 0 0

1log 0log 0log 0log

0log 1log 1log 1log

1log 1log 1log 1log