[go: up one dir, main page]

0% found this document useful (0 votes)
7 views37 pages

Introduction To NLP

Natural Language Processing (NLP) is a subfield of Artificial Intelligence focused on enabling computers to understand and process human language. Key applications include text analytics, question answering, and machine translation, which utilize algorithms to analyze and interpret language. The document also discusses various NLP concepts, challenges such as ambiguity, and the steps involved in NLP processes.

Uploaded by

rafizul1973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views37 pages

Introduction To NLP

Natural Language Processing (NLP) is a subfield of Artificial Intelligence focused on enabling computers to understand and process human language. Key applications include text analytics, question answering, and machine translation, which utilize algorithms to analyze and interpret language. The document also discusses various NLP concepts, challenges such as ambiguity, and the steps involved in NLP processes.

Uploaded by

rafizul1973
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

1

A SHORT INTRODUCTION
TO
NATURAL LANGUAGE PROCESSING

AYSHA AKTHER

30/04/2018
What is NLP
2

 Natural Language Processing (NLP) is the


subfield of Artificial Intelligence concerned with
building computer systems to perform useful and
interesting tasks involving human language.

30/04/2018
What is NLP
3

More specifically, it’s about the algorithms that we


use to process language, the formal basis for those
algorithms, and the facts about human language that
allow those algorithms to work.

30/04/2018
Why Should we Care?
4

Three trends
1. An enormous amount of information is now
available in machine readable form as natural
language text (newspapers, web pages, medical
records, financial filings, etc.)
2. Conversational agents are becoming an important
form of human-computer communication
3. Much of human-human interaction is now
mediated by computers via social media
30/04/2018
Applications
5

Let’s take a quick look at three important application


areas
 Text analytics

 Question answering

 Machine translation

30/04/2018
Text Analytics
6

 Data-mining of weblogs, microblogs, discussion


forums, message boards, user groups, and other
forms of user generated media
 Product marketing information
 Political opinion tracking
 Social network analysis
 Buzz analysis (what’s hot, what topics are people
talking about right now)

30/04/2018
Text Analytics
7

30/04/2018
Text Analytics
8

30/04/2018
Question Answering
9

 Traditional information retrieval provides


documents/resources that provide users with what
they need to satisfy their information needs.
 Question answering on the other hand directly
provides an answer to information needs posed as
questions.

30/04/2018
Web Q/A
10

30/04/2018
Watson
11

30/04/2018
Machine Translation
12

The automatic translation of texts between


languages is one of the oldest non-numerical
applications in Computer Science.

In the past 10 years or so, MT has gone from a


niche academic curiosity to a robust commercial
industry.

30/04/2018
Machine Translation
13

30/04/2018
Machine Translation
14

30/04/2018
How?
15

All of these applications operate by exploiting


underlying regularities inherent in human
languages.
Sometimes in complex ways, sometimes in pretty
trivial ways.

Language Formal Practical


structure models Applications

30/04/2018
NLP Terminology
16

 Phonology − It is study of organizing sound systematically.


 Morphology − It is a study of construction of words from primitive
meaningful units.
 Morpheme − It is primitive unit of meaning in a language.
 Syntax − It refers to arranging words to make a sentence. It also involves
determining the structural role of words in the sentence and in phrases.
 Semantics − It is concerned with the meaning of words and how to
combine words into meaningful phrases and sentences.
 Pragmatics − It deals with using and understanding sentences in different
situations and how the interpretation of the sentence is affected.
 Discourse − It deals with how the immediately preceding sentence can
affect the interpretation of the next sentence.
30/04/2018
NLP Ambiguity
17

 Ambiguity is a fundamental problem in


computational linguistics
 Hence, resolving, or managing, ambiguity is a
recurrent theme

30/04/2018
NLP Ambiguity
18

 Find at least 5 meanings of this sentence:


 I made her duck
 I cooked waterfowl for her benefit (to eat)

 I cooked waterfowl belonging to her

 I created the (ceramic?) duck she owns

 I caused her to quickly lower her upper body

 I waved my magic wand and turned her into


undifferentiated waterfowl

30/04/2018
Steps in NLP
19

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Discourse Integration

Pragmatic Analysis

30/04/2018
Lexical Analysis
20

 A lexical analyser is a program which breaks a text


into lexemes (tokens).
 The process of breaking a text up into its
constituent tokens is known as tokenisation.
 Tokenisation can occur at a number of different
levels: a text could be broken up into paragraphs,
sentences, words, syllables, or phonemes.
 problems
 Tokenisation in languages where no word boundaries
are explicitly marked

30/04/2018
Syntactic Analysis
21

 Syntactic analysis is the process of analyzing a text,


made of a sequence of tokens (for example, words),
to determine its grammatical structure with respect
to a given formal grammar.
 There are a number of algorithms researchers have
developed for syntactic analysis, but we consider
only the following simple methods −
 Context-Free
Grammar
 Top-Down Parser

30/04/2018
Syntactic Analysis
22

 Let us create grammar to parse a sentence −


“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective
+ Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
30/04/2018
Syntactic Analysis
23

 S → NP VP
 NP → DET N | DET ADJ N
 VP → V NP
 Lexicon −
 DET → a | the
 ADJ → beautiful | chirping

 N → bird | birds | grain | grains

 V → peck | pecks | pecking

30/04/2018
Syntactic Analysis
24

 The parse tree can be created as shown −


S

NP VP

DET N V NP

DET N

The bird pecks The grains


30/04/2018
Semantic Analysis
25

 Semantic analysis is understanding language


 Understanding language means knowing how to use it.
 The first step in any semantic processing is to look up the
individual word in the dictionary and extract their meaning.
 Growth of the Internet has produced an important resource
for Semantic analysis
 discussion forums, wiki, books, etc.
 specific resources: WordNet(Princeton), FrameNet (Berkeley),
ConceptNet(MIT)
 machine translation: bilingual data repositories (official sources
such as EU and parliaments) – many things have already been
translated once

30/04/2018
Semantic Analysis
26

 Consider the sentence


 “Jhony has a ball."
 “Ball has a Jhony."

 Probably the most popular and widely used


resource for SA is WordNet

30/04/2018
Semantic Analysis
27

30/04/2018
Discourse Integration
28

 The meaning of Individual sentence is dependent


on previous sentence.

 Bill had a red balloon. John wanted it.

30/04/2018
Pragmatic Analysis
29

 Understanding the text & dialogues. It derives


Knowledge from external commonsense information.
 Methods of directly encoding context into language
 Boundary of semantics and pragmatics is not well
defined. So pragmatics is a difficult field for
computational linguistics
 "Meet me here a week from now with a stick about this big."
 Without knowing the context, we cannot know the meanings
of me, now (and hence a week from now), this big.

30/04/2018
Use of NLP Algorithms
30

 Summarize blocks of text using Summarizer to extract the most


important and central ideas while ignoring irrelevant information.
 Create a chat bot using Parsey McParseface, a language parsing
deep learning model made by Google that uses Point-of-Speech
tagging.
 Automatically generate keyword tags from content
using AutoTag, which leverages LDA, a technique that discovers
topics contained within a body of text.
 Identify the type of entity extracted, such as it being a person,
place, or organization using Named Entity Recognition.
 Use Sentiment Analysis to identify the sentiment of a string of
text, from very negative to neutral to very positive.
 Reduce words to their root, or stem, using PorterStemmer,
or break up text into tokens using Tokenizer.

30/04/2018
Use of Summarizer
31

30/04/2018
Use Of Tokenizer
32

Example 2
• Input: A sentence
"This is a sentence to tokenize. This sentence is here for the same reason."

Output:
[
"This",
"is",
"a",
"sentence",
"to",
"tokenize",
".",
"This",
"sentence",
"is",
"here",
"for",
"the",
"same",
"reason",
"."
]
30/04/2018
Use Of AutoTag
33

Example 3
• Input: Long Wikipedia text
Voyager 2 is a space probe launched by NASA on August 20, 1977, to study the outer planets. Part of the Voyager program, it
was launched 16 days before its twin, Voyager 1, on a trajectory that took longer to reach Jupiter and Saturn but enabled further
encounters with Uranus and Neptune.[4] It is the only spacecraft to have visited either of the ice giants.

Its primary mission ended with the exploration of the Neptunian system on October 2, 1989, after having visited the Uranian
system in 1986, the Saturnian system in 1981, and the Jovian system in 1979. Voyager 2 is now in its extended mission to study
the outer reaches of the Solar System and has been operating for 40 years, 2 months and 11 days as of October 31, 2017. It
remains in contact through the Deep Space Network.[5]

At a distance of 115 AU (1.72×1010 km) from the Sun as of July 30, 2017,[6] Voyager 2 is the fourth of five spacecraft to
achieve the escape velocity that will allow them to leave the Solar System. The probe was moving at a velocity of 15.4 km/s
(55,000 km/h) relative to the Sun as of December 2014 and is traveling through the heliosheath.[6][7] Upon reaching interstellar
space, Voyager 2 is expected to provide the first direct measurements of the density and temperature of the interstellar plasma.

Output:

30/04/2018
Use Of Sentiment Analysis
34

Example 4
• Input: Batch of sentences

30/04/2018
Open Source NLP Libraries
35

 These libraries provide the algorithmic building blocks of NLP in real-


world applications. Algorithmia provides a free API endpoint for many of
these algorithms, without ever having to setup or provision servers and
infrastructure.
 Apache OpenNLP: a machine learning toolkit that provides tokenizers,
sentence segmentation, part-of-speech tagging, named entity
extraction, chunking, parsing, coreference resolution, and more.
 Natural Language Toolkit (NLTK): a Python library that provides modules
for processing text, classifying, tokenizing, stemming, tagging, parsing,
and more.
 Standford NLP: a suite of NLP tools that provide part-of-speech tagging,
the named entity recognizer, coreference resolutionsystem, sentiment
analysis, and more.
 MALLET: a Java package that provides Latent Dirichlet Allocation,
document classification, clustering, topic modeling, information extraction,
and more.

30/04/2018
References
36

 https://algorithmia.com/users/nlp
 https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural
_language_processing.htm
 Winograd T. Understanding natural language. Cognitive psychology. 1972 Jan
31;3(1):1-91.
 Jurafsky, D., and C. Manning. "Sentiment analysis, natural language
processing." Coursera. com (2014).
 Corston, Simon H., et al. "System for processing textual inputs using natural
language processing techniques." U.S. Patent No. 6,901,399. 31 May 2005.
 Keshtkar, Fazel, and Diana Inkpen. "Using sentiment orientation features for mood
classification in blogs." Natural Language Processing and Knowledge
Engineering, 2009. NLP-KE 2009. International Conference on. IEEE, 2009.
 Manning, Christopher D., et al. "The stanford corenlp natural language processing
toolkit." ACL (System Demonstrations). 2014.
 Altmann, Gerry, and Mark Steedman. "Interaction with context during human
sentence processing." Cognition 30.3 (1988): 191-238.

30/04/2018
THANK YOU

37 30/04/2018

You might also like