[go: up one dir, main page]

0% found this document useful (0 votes)
12 views27 pages

Module 1

Uploaded by

vishnoi.ayush05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

Module 1

Uploaded by

vishnoi.ayush05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

MODULE-1

INTRODUCTION TO NATURAL LANGUAGE PROCESSING


DR. SOUMYA SANKAR GHOSH
ASSISTANT PROFESSOR
VIT BHOPAL UNIVERSITY
INTRODUCTION

• Language has been described as the most human of our faculties.


• Language is a relation between sound and meaning.
• As a mathematician would put it, a set of pairs consisting of a sound followed by a
meaning.
• One of the things that makes language interesting is that it is infinite—or, more precisely,
that it is an infinite set of signs.
INTRODUCTION

• Consider the example below:


This is the house that Jack built.
This is the malt that lay in the house that Jack built.
This is the rat that ate the malt that lay in the house that Jack built.
This is the cat that chased the rat that ate the malt that lay in the house that Jack built.
• The idea is that we could continue adding more and more items to each list; we could
continue adding words or phrases ad infinitum.
INTRODUCTION

• We have said that language is (a) an infinite set of (b) finitely long sequences of words,
where the words are all drawn from (c) a finite set.
• These ideas become clearer when we realize that exactly the same thing holds in a more
familiar setting— that of numerals.
• {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
A NOTE ON NLP

• Computational linguistics is the study of computer systems for understanding and


generating natural language.
• Why should we be interested in such systems?
• Although the objectives of research in computational linguistics are widely varied, a primary
motivation has always been the development of specific practical systems which involve
natural language.
APPLICATION OF NLP

• Machine Translation:
• Primary intention of MT is to provide non-English-speaking readers the vast amount of
scientific information on the Web in English.
• Or translating for English speakers the hundreds of millions of Web pages written in other
languages other than English.
• The goal of machine translation is to automatically translate a document from one
language to another.
APPLICATION OF NLP

• Information retrieval
• Many other language processing tasks are also related to the Web. Another such automatic
information retrieval from natural language texts. In response to a query, the system was to
extract the relevant text from a corpus and either display the text or use the text to answer
the query directly.
• This is a generalization of simple web search, where instead of just typing keywords a user
might ask complete questions, ranging from easy to hard,
APPLICATION OF NLP

• Man-machine interfaces:
• Natural language seems the most convenient mode for communication with interactive
systems particularly for people other than computer specialists.
• It has several advantages over the first two application areas as a test for natural language
interfaces.
• First, the input to such systems is typically simpler (both syntactically and semantically) than the
texts to be processed for machine translation or information retrieval.
• Second, the interactive nature of the application allows the system to be useable even if it
occasionally rejects an input
NLP: SUMMARY

• Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that makes
human language intelligible to machines. NLP combines the power of linguistics and
computer science to study the rules and structure of language, and create intelligent
systems (run on machine learning and NLP algorithms) capable of understanding,
analyzing, and extracting meaning from text and speech.
NLP BENEFITS

• Perform large-scale analysis. Natural Language Processing helps machines


automatically understand and analyze huge amounts of unstructured text data, like social
media comments, customer support tickets, online reviews, news reports, and more.
• Automate processes in real-time. Natural language processing tools can help
machines learn to sort and route information with little to no human interaction –
quickly, efficiently, accurately, and around the clock.
• Tailor NLP tools to your industry. Natural language processing algorithms can be
tailored to your needs and criteria, like complex, industry-specific language – even
sarcasm and misused words.
STEPS OF NLP

• Morphological Analysis
• Syntactic Analysis
• Semantic Analysis
• Discourse Analysis
• Pragmatic Analysis
NLP TASK AND TECHNIQUES

• Many natural language processing tasks involve morphological syntactic and semantic
analysis, used to break down human language into machine-readable chunks.
• Morphological analysis, Individual words are analyzed into their components, and non-
word tokens (such as punctuation) are separated from the words. For example, in the
phrase "Bill's house" the proper noun "Bill" is separated from the possessive suffix "'s."
• Syntactic analysis, also known as parsing or syntax analysis, identifies the syntactic
structure of a text and the dependency relationships between words, represented on a
diagram called a parse tree.
NLP TASK AND TECHNIQUES

• Semantic analysis focuses on identifying the meaning of language. However, since


language is polysemic and ambiguous, semantics is considered one of the most challenging
areas in NLP.
• Semantic tasks analyze the structure of sentences, word interactions, and related
concepts, in an attempt to discover the meaning of words, as well as understand the topic
of a text.
• Let’s look at some of the main sub-tasks of morphological semantic and syntactic analysis:
MORPHOLOGICAL ANALYSIS

• Tokenization
• Tokenization is an essential task in natural language processing used to break up a string of
words into semantically useful units called tokens.
• Sentence tokenization splits sentences within a text, and word tokenization splits words
within a sentence. Generally, word tokens are separated by blank spaces, and sentence tokens
by stops. However, you can perform high-level tokenization for more complex structures, like
words that often go together, otherwise known as collocations (e.g., New York).
MORPHOLOGICAL ANALYSIS

• Example:
• Customer service couldn’t be better! = “customer service” “could” “not” “be” “better”.
• Input text:: राम स्कूल गया
Address TOKEN
0 राम
1 स्कूल
2 गया
• http://sampark.iiit.ac.in/tokenizer/web/restapi.php/indic/tokenizer
MORPHOLOGICAL ANALYSIS

• POS Tagger Part-of-speech tagging (abbreviated as PoS tagging) involves adding a part of
speech category to each token within a text. Some common PoS tags are verb, adjective, noun,
pronoun, conjunction, preposition, intersection, among others
• Lemmatization
• When we speak or write, we tend to use inflected forms of a word (words in their different
grammatical forms). To make these words easier for computers to understand, NLP uses
lemmatization and stemming to transform them back to their root form.
• When we speak or write, we tend to use inflected forms of a word (words in their different
grammatical forms). To make these words easier for computers to understand, NLP uses
lemmatization and stemming to transform them back to their root form.
MORPHOLOGICAL ANALYSIS

• When we refer to stemming, the root form of a word is called a stem. Stemming "trims"
words, so word stems may not always be semantically correct.
• For example, stemming the words “consult,” “consultant,” “consulting,” and “consultants”
would result in the root form “consult.”
SYNTACTIC ANALYSIS

• Dependency grammar refers to the way the words in a sentence are connected. A dependency
parser, therefore, analyzes how ‘head words’ are related and modified by other words too
understand the syntactic structure of a sentence:

Analyzing text is very hard


NLP TASK AND TECHNIQUES

• Constituency Parsing
• Constituency Parsing aims to visualize the entire syntactic structure of a sentence by
identifying phrase structure grammar. It consists of using abstract terminal and non-terminal
nodes associated to words,
SEMANTIC ANALYSIS

• Word Sense Disambiguation


• Depending on their context, words can have different meanings. Take the word “book”, for
example:
• You should read this book; it’s a great novel!
• You should book the flights as soon as possible.
• You should close the books by the end of the year.
• You should do everything by the book to avoid potential complications.
DISCOURSE ANALYSIS

• Text segmentation in NLP is the process of transforming text into meaningful units like
words, sentences, different topics, the underlying intent and more.
• Mostly, the text is segmented into its component words, which can be a difficult task,
depending on the language.This is again due to the complexity of human language.
• For example, it works relatively well in English to separate words by spaces, except for
words like "icebox" that belong together but are separated by a space. The problem is
that people sometimes also write it as "ice-box."
DISCOURSE ANALYSIS

• Named Entity Recognition


• Named entity recognition (NER) concentrates on determining which items in a text (i.e. the
"named entities") can be located and classified into pre-defined categories. These categories
can range from the names of persons, organizations and locations to monetary values and
percentages.

• For example:
• Before NER: Martin bought 300 shares of SAP in 2016.
• After NER: [Martin]Person bought 300 shares of [SAP]Organization in [2016]Time.
DISCOURSE ANALYSIS

• Relationship Extraction
• Relationship extraction takes the named entities of NER and tries to identify the semantic
relationships between them. This could mean, for example, finding out who is married to
whom, that a person works for a specific company and so on. This problem can also be
transformed into a classification problem and a machine learning model can be trained for
every relationship type.
DISCOURSE ANALYSIS

• Sentiment Analysis
• With sentiment analysis we want to determine the attitude (i.e. the sentiment) of a speaker or
writer with respect to a document, interaction or event. Therefore it is a natural language
processing problem where text needs to be understood in order to predict the underlying
intent. The sentiment is mostly categorized into positive, negative and neutral categories.
• With the use of sentiment analysis, for example, we may want to predict a customer's opinion
and attitude about a product based on a review they wrote. Sentiment analysis is widely
applied to reviews, surveys, documents and much more.
CHALLENGES

• There are many challenges in Natural language processing but one of the main reasons
NLP is difficult is simply because human language is ambiguous.
• Even humans struggle to analyze and classify human language correctly.
• Take sarcasm, for example. How do you teach a machine to understand an expression
that’s used to say the opposite of what’s true? While humans would easily detect sarcasm
in this comment, below, it would be challenging to teach a machine how to interpret this
phrase:
• “If I had a dollar for every smart thing you say, I’d be poor
CHALLENGES

• To fully comprehend human language, data scientists need to teach NLP tools to look
beyond definitions and word order, to understand context, word ambiguities, and other
complex concepts connected to messages. But, they also need to consider other aspects,
like culture, background, and gender, when fine-tuning natural language processing models.
Sarcasm and humor, for example, can vary greatly from one country to the next.
NLP VS CL

• The intended distinction between computational linguistics and natural language processing is
that Computational linguistics is for “works on the application of computers in processing and
analyzing language,” whereas Natural language processing is for “works on the computer
processing of natural language for the purpose of enabling humans to interact with computers
in natural language.” The distinction, however, does not reflect current thought.
• Computational linguists tend to agree that “natural language processing” (NLP) and
“computational linguistics” (CL) mean pretty much the same thing (or, if different, that the
meaning of natural language processing is encompassed within the meaning of computational
linguistics). That means we can merge natural language processing and computational
linguistics relatively easily.

You might also like