0% found this document useful (0 votes)

18 views4 pages

NLP Basics

Uploaded by

barwaniwalaaziz56

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views4 pages

NLP Basics

Uploaded by

barwaniwalaaziz56

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

NLP basics

only 21% of data is stored in structured form rest is getting generated

generated data is mostly unstructured and in text form

What is NLP?
Natural language processing

helps computer to understand and work with humans

applications include: summarization, translating languages, autocomplete

searches, chatbots, voice assistance and many more

What are Corpus, Tokens, and Engrams?

corpus
collection of text documents. documents further comprises of paragraphs which
is further comprised into lines and then comes individual characters called tokens

Engrams
are defined as the group of n words together. For example, consider this given
sentence-
“I love my phone.”
In this sentence, the uni-grams(n=1) are: I, love, my, phone
Di-grams(n=2) are: I love, love my, my phone
And tri-grams(n=3) are: I love my, love my phone

NLP basics 1
So, uni-grams are representing one word, di-grams are representing two words
together and tri-grams are representing three words together.

Tokenization
process of splitting a text object into smaller units which is also called token

1) White space tokenization

also known as unigram tokenization

For example, in a sentence- “I went to New-York to play football.”

This will be split into following tokens: “I”, “went”, “to”, “New-York”, “to”, “play”,
“football.”
Notice that “New-York” is not split further because the tokenization process
was based on whitespaces only.

2) Regular Expression Tokenization

Normalization
Morpheme: it is the base form of a word

tokens are made up to two components mainly the morpheme which is the
base word and the inflectional form which is the prefix or suffix to morphemes

Normalization is converting a token into its base form

NLP basics 2
Types of normalization
Stemming
rule based process for removing inflectional forms from tokens and the
outputs are the stem of the word

stemming is not preferred because it will form words which are not in the
dictionary for example: winning will turn into winn

Lemmatization
Systematic step by step process for removing inflection forms of a word

it makes use of vocabulary, word structure, part of speech tags, and grammar

output of lemmatization is the root word called a lemma

Parts of Speech (PoS) Tags in NLP

NLP basics 3
Properties of words which define their main context

types of speech tags are: nouns, verbs, adjectives, adverbs etc

PoS have a large application and they are used in variety of tasks such as text
cleaning, feature engineering tasks and word sense disambiguation

Grammar in NLP
rules for forming well structured sentences.

Types of grammar:

Constituency Grammar
any group of word or word can be termed as constituents

it organizes any sentence into its constituents using their properties

these properties are driven by their part of speech tags, noun or verb

Another view to look at constituency grammar is to define their grammar in

terms of their part of speech tags.

Dependency Grammar
Dependency Grammar is a type of grammar that organizes words in a sentence
based on their dependencies, with one word acting as a root and all other words
linked to it. These dependencies represent relationships among words and are
used to infer sentence structure and semantics. Each dependency can be
represented as a triplet containing a governor, a relation, and a dependent.
Dependency grammars are used in various applications, including Named Entity
Recognition, Question Answering Systems, Coreference Resolution, Text
Summarization, and Text Classification.

NLP basics 4

NLP and Python Course Overview
No ratings yet
NLP and Python Course Overview
121 pages
NLP Notes
No ratings yet
NLP Notes
56 pages
NLP Final
No ratings yet
NLP Final
27 pages
NLP m2
No ratings yet
NLP m2
71 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP Notes
No ratings yet
NLP Notes
43 pages
NLP Material
No ratings yet
NLP Material
250 pages
NLPNotes
No ratings yet
NLPNotes
12 pages
Introduction To NLP
No ratings yet
Introduction To NLP
15 pages
Grapheme:: Morpheme
No ratings yet
Grapheme:: Morpheme
20 pages
NLP Sem Imp
No ratings yet
NLP Sem Imp
46 pages
NLP Unit-I Notes
No ratings yet
NLP Unit-I Notes
19 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
Grammar in NLP
No ratings yet
Grammar in NLP
14 pages
Week 12 Topic 8 NLP
No ratings yet
Week 12 Topic 8 NLP
31 pages
1 Intro To NLP
100% (1)
1 Intro To NLP
46 pages
Lecture 2 NLP
No ratings yet
Lecture 2 NLP
27 pages
NLP Notes 2
No ratings yet
NLP Notes 2
137 pages
Natural Language Processing
No ratings yet
Natural Language Processing
24 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Solution NLP UT1
No ratings yet
Solution NLP UT1
7 pages
NLP Module 1
No ratings yet
NLP Module 1
124 pages
NLP Unit 1
No ratings yet
NLP Unit 1
15 pages
Unit-4 NLP
No ratings yet
Unit-4 NLP
54 pages
NLP Unit 1 Part1
No ratings yet
NLP Unit 1 Part1
61 pages
Natural Language Processing
No ratings yet
Natural Language Processing
72 pages
Text Analytics and Natural Language Processing - KAI073
No ratings yet
Text Analytics and Natural Language Processing - KAI073
24 pages
Natural Language Processing
No ratings yet
Natural Language Processing
14 pages
NLP Unit-1
No ratings yet
NLP Unit-1
37 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Natural Language Processing
100% (2)
Natural Language Processing
48 pages
NLP Study
No ratings yet
NLP Study
48 pages
NLP
No ratings yet
NLP
17 pages
NLP Notes
No ratings yet
NLP Notes
18 pages
Module 1.1
No ratings yet
Module 1.1
9 pages
NLP Insem Notes
No ratings yet
NLP Insem Notes
13 pages
Introduction To Natural Language Processing: Unit 1
No ratings yet
Introduction To Natural Language Processing: Unit 1
60 pages
NLP Applications in Healthcare
No ratings yet
NLP Applications in Healthcare
71 pages
NLP Final
No ratings yet
NLP Final
72 pages
Natural Language Processing Unit 1
No ratings yet
Natural Language Processing Unit 1
16 pages
NLP Unit 1 1
No ratings yet
NLP Unit 1 1
67 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
NLP Digital Notes
No ratings yet
NLP Digital Notes
128 pages
Module 1
No ratings yet
Module 1
27 pages
Natural Language Processing (NLP) in AI
No ratings yet
Natural Language Processing (NLP) in AI
7 pages
Session1 2024 - 2025 - Natural Language Processing
No ratings yet
Session1 2024 - 2025 - Natural Language Processing
40 pages
NLP Pyq Solutions
No ratings yet
NLP Pyq Solutions
59 pages
NLP Unit-1 Notes
No ratings yet
NLP Unit-1 Notes
162 pages
UNIT-1 Notes
No ratings yet
UNIT-1 Notes
19 pages
NLP Chapter-1
No ratings yet
NLP Chapter-1
24 pages
Module-1 - Introduction To NLP
No ratings yet
Module-1 - Introduction To NLP
39 pages
FALLSEM2019-20 CSE4022 ETH VL2019201002590 Reference Material I 17-Jul-2019 NLP1-Lecture 4
No ratings yet
FALLSEM2019-20 CSE4022 ETH VL2019201002590 Reference Material I 17-Jul-2019 NLP1-Lecture 4
34 pages
Natural Language Processing Lec 1
No ratings yet
Natural Language Processing Lec 1
23 pages
Deep Attention for Imbalanced Image Classification
No ratings yet
Deep Attention for Imbalanced Image Classification
11 pages
Nurturing Talent Within The Family Reading Answers
No ratings yet
Nurturing Talent Within The Family Reading Answers
4 pages
Learning To Read and Spell in Albanian, English and Welsh: The Effect of Orthographic Transparency
No ratings yet
Learning To Read and Spell in Albanian, English and Welsh: The Effect of Orthographic Transparency
372 pages
Sanitile 755 PDS
No ratings yet
Sanitile 755 PDS
4 pages
Holiday Homework of Class 9
No ratings yet
Holiday Homework of Class 9
5 pages
Exam Question
No ratings yet
Exam Question
2 pages
Assignment Stat-500
No ratings yet
Assignment Stat-500
2 pages
Density 1 QP
No ratings yet
Density 1 QP
7 pages
Thurstone Scales
No ratings yet
Thurstone Scales
19 pages
Water Saturation: 26.1 Methods Available To Determine
100% (1)
Water Saturation: 26.1 Methods Available To Determine
14 pages
P.manju CV EDIT
No ratings yet
P.manju CV EDIT
3 pages
Vision 2047 Concept Note Gangtok
No ratings yet
Vision 2047 Concept Note Gangtok
39 pages
Рогалев, Чернявский
100% (1)
Рогалев, Чернявский
8 pages
2025 Thi Thu PTNK Dot1 Anh KC de Thi Đã G P
No ratings yet
2025 Thi Thu PTNK Dot1 Anh KC de Thi Đã G P
7 pages
Sustainable Livelihood
No ratings yet
Sustainable Livelihood
21 pages
Thacher 2006
No ratings yet
Thacher 2006
47 pages
Instant Download Food Processing Technology: Principles and Practice 5th Edition P.J. Fellows PDF All Chapter
100% (2)
Instant Download Food Processing Technology: Principles and Practice 5th Edition P.J. Fellows PDF All Chapter
57 pages
Credit Recovery Edgenuity Course Master: Social Science
No ratings yet
Credit Recovery Edgenuity Course Master: Social Science
1 page
Nature of Science and Technology Introduction 1
No ratings yet
Nature of Science and Technology Introduction 1
15 pages
North Omaha Community Survey Results (Bold Nebraska)
No ratings yet
North Omaha Community Survey Results (Bold Nebraska)
2 pages
Grade 10 Unit 1 Entrep
100% (3)
Grade 10 Unit 1 Entrep
33 pages
D Ifta Journal 09
No ratings yet
D Ifta Journal 09
64 pages
Applying A Multi Objective Optimization
No ratings yet
Applying A Multi Objective Optimization
15 pages
Umlaut in Optimality Theory A Comparative Analysis of German and Chamorro Thomas B. Klein Ready To Read
100% (2)
Umlaut in Optimality Theory A Comparative Analysis of German and Chamorro Thomas B. Klein Ready To Read
149 pages
ĐỀ 11.MH2022.HS - Sao chép
No ratings yet
ĐỀ 11.MH2022.HS - Sao chép
5 pages
LAB 8 Bitumen Penetration OEL1
No ratings yet
LAB 8 Bitumen Penetration OEL1
3 pages
Iquanta SET 6
100% (1)
Iquanta SET 6
3 pages
Jagot Paul-Clement - The Influence at A Distance
No ratings yet
Jagot Paul-Clement - The Influence at A Distance
45 pages
Evaluation of Combine Harvester Operation Costs in
No ratings yet
Evaluation of Combine Harvester Operation Costs in
7 pages
Sci9 - Q3 - Mod1 - Types of Volcanoes and Volcanic Eruptions - Version3
72% (18)
Sci9 - Q3 - Mod1 - Types of Volcanoes and Volcanic Eruptions - Version3
24 pages

NLP Basics

Uploaded by

NLP Basics

Uploaded by

NLP basics

only 21% of data is stored in structured form rest is getting generated

generated data is mostly unstructured and in text form

helps computer to understand and work with humans

applications include: summarization, translating languages, autocomplete

What are Corpus, Tokens, and Engrams?

1) White space tokenization

For example, in a sentence- “I went to New-York to play football.”

2) Regular Expression Tokenization

Normalization is converting a token into its base form

output of lemmatization is the root word called a lemma

Parts of Speech (PoS) Tags in NLP

types of speech tags are: nouns, verbs, adjectives, adverbs etc

it organizes any sentence into its constituents using their properties

Another view to look at constituency grammar is to define their grammar in

You might also like