1
A SHORT INTRODUCTION
TO
NATURAL LANGUAGE PROCESSING
AYSHA AKTHER
30/04/2018
What is NLP
2
Natural Language Processing (NLP) is the
subfield of Artificial Intelligence concerned with
building computer systems to perform useful and
interesting tasks involving human language.
30/04/2018
What is NLP
3
More specifically, it’s about the algorithms that we
use to process language, the formal basis for those
algorithms, and the facts about human language that
allow those algorithms to work.
30/04/2018
Why Should we Care?
4
Three trends
1. An enormous amount of information is now
available in machine readable form as natural
language text (newspapers, web pages, medical
records, financial filings, etc.)
2. Conversational agents are becoming an important
form of human-computer communication
3. Much of human-human interaction is now
mediated by computers via social media
30/04/2018
Applications
5
Let’s take a quick look at three important application
areas
Text analytics
Question answering
Machine translation
30/04/2018
Text Analytics
6
Data-mining of weblogs, microblogs, discussion
forums, message boards, user groups, and other
forms of user generated media
Product marketing information
Political opinion tracking
Social network analysis
Buzz analysis (what’s hot, what topics are people
talking about right now)
30/04/2018
Text Analytics
7
30/04/2018
Text Analytics
8
30/04/2018
Question Answering
9
Traditional information retrieval provides
documents/resources that provide users with what
they need to satisfy their information needs.
Question answering on the other hand directly
provides an answer to information needs posed as
questions.
30/04/2018
Web Q/A
10
30/04/2018
Watson
11
30/04/2018
Machine Translation
12
The automatic translation of texts between
languages is one of the oldest non-numerical
applications in Computer Science.
In the past 10 years or so, MT has gone from a
niche academic curiosity to a robust commercial
industry.
30/04/2018
Machine Translation
13
30/04/2018
Machine Translation
14
30/04/2018
How?
15
All of these applications operate by exploiting
underlying regularities inherent in human
languages.
Sometimes in complex ways, sometimes in pretty
trivial ways.
Language Formal Practical
structure models Applications
30/04/2018
NLP Terminology
16
Phonology − It is study of organizing sound systematically.
Morphology − It is a study of construction of words from primitive
meaningful units.
Morpheme − It is primitive unit of meaning in a language.
Syntax − It refers to arranging words to make a sentence. It also involves
determining the structural role of words in the sentence and in phrases.
Semantics − It is concerned with the meaning of words and how to
combine words into meaningful phrases and sentences.
Pragmatics − It deals with using and understanding sentences in different
situations and how the interpretation of the sentence is affected.
Discourse − It deals with how the immediately preceding sentence can
affect the interpretation of the next sentence.
30/04/2018
NLP Ambiguity
17
Ambiguity is a fundamental problem in
computational linguistics
Hence, resolving, or managing, ambiguity is a
recurrent theme
30/04/2018
NLP Ambiguity
18
Find at least 5 meanings of this sentence:
I made her duck
I cooked waterfowl for her benefit (to eat)
I cooked waterfowl belonging to her
I created the (ceramic?) duck she owns
I caused her to quickly lower her upper body
I waved my magic wand and turned her into
undifferentiated waterfowl
30/04/2018
Steps in NLP
19
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Discourse Integration
Pragmatic Analysis
30/04/2018
Lexical Analysis
20
A lexical analyser is a program which breaks a text
into lexemes (tokens).
The process of breaking a text up into its
constituent tokens is known as tokenisation.
Tokenisation can occur at a number of different
levels: a text could be broken up into paragraphs,
sentences, words, syllables, or phonemes.
problems
Tokenisation in languages where no word boundaries
are explicitly marked
30/04/2018
Syntactic Analysis
21
Syntactic analysis is the process of analyzing a text,
made of a sequence of tokens (for example, words),
to determine its grammatical structure with respect
to a given formal grammar.
There are a number of algorithms researchers have
developed for syntactic analysis, but we consider
only the following simple methods −
Context-Free
Grammar
Top-Down Parser
30/04/2018
Syntactic Analysis
22
Let us create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective
+ Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
30/04/2018
Syntactic Analysis
23
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexicon −
DET → a | the
ADJ → beautiful | chirping
N → bird | birds | grain | grains
V → peck | pecks | pecking
30/04/2018
Syntactic Analysis
24
The parse tree can be created as shown −
S
NP VP
DET N V NP
DET N
The bird pecks The grains
30/04/2018
Semantic Analysis
25
Semantic analysis is understanding language
Understanding language means knowing how to use it.
The first step in any semantic processing is to look up the
individual word in the dictionary and extract their meaning.
Growth of the Internet has produced an important resource
for Semantic analysis
discussion forums, wiki, books, etc.
specific resources: WordNet(Princeton), FrameNet (Berkeley),
ConceptNet(MIT)
machine translation: bilingual data repositories (official sources
such as EU and parliaments) – many things have already been
translated once
30/04/2018
Semantic Analysis
26
Consider the sentence
“Jhony has a ball."
“Ball has a Jhony."
Probably the most popular and widely used
resource for SA is WordNet
30/04/2018
Semantic Analysis
27
30/04/2018
Discourse Integration
28
The meaning of Individual sentence is dependent
on previous sentence.
Bill had a red balloon. John wanted it.
30/04/2018
Pragmatic Analysis
29
Understanding the text & dialogues. It derives
Knowledge from external commonsense information.
Methods of directly encoding context into language
Boundary of semantics and pragmatics is not well
defined. So pragmatics is a difficult field for
computational linguistics
"Meet me here a week from now with a stick about this big."
Without knowing the context, we cannot know the meanings
of me, now (and hence a week from now), this big.
30/04/2018
Use of NLP Algorithms
30
Summarize blocks of text using Summarizer to extract the most
important and central ideas while ignoring irrelevant information.
Create a chat bot using Parsey McParseface, a language parsing
deep learning model made by Google that uses Point-of-Speech
tagging.
Automatically generate keyword tags from content
using AutoTag, which leverages LDA, a technique that discovers
topics contained within a body of text.
Identify the type of entity extracted, such as it being a person,
place, or organization using Named Entity Recognition.
Use Sentiment Analysis to identify the sentiment of a string of
text, from very negative to neutral to very positive.
Reduce words to their root, or stem, using PorterStemmer,
or break up text into tokens using Tokenizer.
30/04/2018
Use of Summarizer
31
30/04/2018
Use Of Tokenizer
32
Example 2
• Input: A sentence
"This is a sentence to tokenize. This sentence is here for the same reason."
Output:
[
"This",
"is",
"a",
"sentence",
"to",
"tokenize",
".",
"This",
"sentence",
"is",
"here",
"for",
"the",
"same",
"reason",
"."
]
30/04/2018
Use Of AutoTag
33
Example 3
• Input: Long Wikipedia text
Voyager 2 is a space probe launched by NASA on August 20, 1977, to study the outer planets. Part of the Voyager program, it
was launched 16 days before its twin, Voyager 1, on a trajectory that took longer to reach Jupiter and Saturn but enabled further
encounters with Uranus and Neptune.[4] It is the only spacecraft to have visited either of the ice giants.
Its primary mission ended with the exploration of the Neptunian system on October 2, 1989, after having visited the Uranian
system in 1986, the Saturnian system in 1981, and the Jovian system in 1979. Voyager 2 is now in its extended mission to study
the outer reaches of the Solar System and has been operating for 40 years, 2 months and 11 days as of October 31, 2017. It
remains in contact through the Deep Space Network.[5]
At a distance of 115 AU (1.72×1010 km) from the Sun as of July 30, 2017,[6] Voyager 2 is the fourth of five spacecraft to
achieve the escape velocity that will allow them to leave the Solar System. The probe was moving at a velocity of 15.4 km/s
(55,000 km/h) relative to the Sun as of December 2014 and is traveling through the heliosheath.[6][7] Upon reaching interstellar
space, Voyager 2 is expected to provide the first direct measurements of the density and temperature of the interstellar plasma.
Output:
30/04/2018
Use Of Sentiment Analysis
34
Example 4
• Input: Batch of sentences
30/04/2018
Open Source NLP Libraries
35
These libraries provide the algorithmic building blocks of NLP in real-
world applications. Algorithmia provides a free API endpoint for many of
these algorithms, without ever having to setup or provision servers and
infrastructure.
Apache OpenNLP: a machine learning toolkit that provides tokenizers,
sentence segmentation, part-of-speech tagging, named entity
extraction, chunking, parsing, coreference resolution, and more.
Natural Language Toolkit (NLTK): a Python library that provides modules
for processing text, classifying, tokenizing, stemming, tagging, parsing,
and more.
Standford NLP: a suite of NLP tools that provide part-of-speech tagging,
the named entity recognizer, coreference resolutionsystem, sentiment
analysis, and more.
MALLET: a Java package that provides Latent Dirichlet Allocation,
document classification, clustering, topic modeling, information extraction,
and more.
30/04/2018
References
36
https://algorithmia.com/users/nlp
https://www.tutorialspoint.com/artificial_intelligence/artificial_intelligence_natural
_language_processing.htm
Winograd T. Understanding natural language. Cognitive psychology. 1972 Jan
31;3(1):1-91.
Jurafsky, D., and C. Manning. "Sentiment analysis, natural language
processing." Coursera. com (2014).
Corston, Simon H., et al. "System for processing textual inputs using natural
language processing techniques." U.S. Patent No. 6,901,399. 31 May 2005.
Keshtkar, Fazel, and Diana Inkpen. "Using sentiment orientation features for mood
classification in blogs." Natural Language Processing and Knowledge
Engineering, 2009. NLP-KE 2009. International Conference on. IEEE, 2009.
Manning, Christopher D., et al. "The stanford corenlp natural language processing
toolkit." ACL (System Demonstrations). 2014.
Altmann, Gerry, and Mark Steedman. "Interaction with context during human
sentence processing." Cognition 30.3 (1988): 191-238.
30/04/2018
THANK YOU
37 30/04/2018