[go: up one dir, main page]

0% found this document useful (0 votes)
80 views45 pages

Natural Language Processing (NLP)

nlp

Uploaded by

solomonmcs52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views45 pages

Natural Language Processing (NLP)

nlp

Uploaded by

solomonmcs52
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Natural Language Processing

(NLP)
Chapter One
Introduction to Natural Language
Processing(NLP)
Instructor Name: Sertse Abebe Ayalew
Email: sertsea@bdu.edu.et or
sertse26@gmail.com
Evaluation
Assignment 15%
Article Review 15%
Exam 70%
Course objectives
• To enable students to understand different issues
concerning the creation of computer programs that
can interpret, generate, and learn natural language.
• Among the issues that will be discussed are:
morphological processing, syntactic processing,
semantic interpretation
• The primary emphasis of this course is on text-based
language processing
• Speech processing will be discussed.

2/20/2007 Husni Al-Muhtaseb 4


Tentative Weekly Schedule
Topic W#
Introduction 1
Regular Expressions & Automata 2
Morphology and Text processing 3
Spelling correction and N-Grams Models 4
Parts of Speech Tagging 5
Parsing 6
Speech Recognition 7
Word Vectors and semantics 8

5
Text Books
• Speech And Language Processing: An
Introduction to Natural Language Processing,
Computational Linguistics, and Speech
Recognition, By Daniel Jurafsky and James H.
Martin, Prentice-Hall, 2017.
Introductions
• What is Natural Language?
– language that is used for everyday communication
– languages like Amharic, Oromiffa, Tigrigna, English, Hindi or
Portuguese.

• Phonemes & alphabets are the smallest structure


• A collection of phonemes(alphabet) create words
– arbitrary collection of alphabets can form string
– all string not form word
– Only legal collection of string could form word
• A collection of Words create a vocabulary
– Vocabulary consists of a set of words legal with in
a language.
– A text(speech) is composed of a sequence of
words from a vocabulary
– A language is constructed of a set of all possible
legal texts (speech)
Levels of Language
• Phonemes/ Morphemes: are meaningful unit of
speech or words.
• Possible studies
– Phonological analysis: A language study deals with
analysis and synthesis of phonemes is called
phonology.
– Morphological analysis: A language study deals with
analysis and synthesis of Morphemes is called
Morphology.
– Semantic analysis: What’s the literal meaning of words
9
• Phrases (sentences): legal units of language
formed from possibly more than one words.
– Syntactic analysis: What kind of grammatical rule
phrases are we dealing with? Which words
modify one another?
• What are the proper ordering of words
– Semantics: What’s the literal meaning of phrases
(sentences) paragraphs?
– Pragmatics: What should you conclude from the
fact that I said something? How should you react?
Communications
Alphabets
{A-Z, a,z, 0-9, Vocabulary Syntax and
@,#,$, A grammar
%,^,&….} Abide
Book
Buzz
is Meaning
the Synonym
Acoustic Xylophone Antonym
sounds zero .
.

Social contexts social uses of all components


Cont’
• Purpose of natural Language
The goal in the production and comprehension of
natural language is communication.
– Communication for the speaker
• Intention: decide what and when are languages are
used to communicate. May require planning and
reasoning about agents’ goals and beliefs.
• Generation: Translate the information to be
communicated ( “language of thought”) into string of
words in desired natural language
• Synthesis: Output the string in desired modality, text or
speech.
Cont’

– Communication for the Reader or Hearer


• Perception: Map sensing modality to a string of words
(Looking at or hearing)
• Analysis :Determine the informative content of the
string or speech tag. Determine the meaning of the
information in its literal or
• Incorporation: Decide whether or not to believe the
content of the string and add it to the permanent
knowledge system .
Automated Natural Language processes

1. Automatic Natural Language Understanding


 Taking some spoken/typed sentence and working
out what it means

2. Automatic Natural Language Generation


 Taking some formal representation of what you
want to say and working out a way to express it
in a natural (human) language (e.g., English)

NLP - Prof. Carolina Ruiz


General Natural Language processes

1: Generation 2. Understanding
• Intention: S wants H to believe • Perception: H perceives words
P W” (ideally W” = W)
• Generation: S chooses words W • Analysis: H infers possible
• Synthesis: S utters words W meanings P1,P2,…,Pn for W”
• Disambiguation: H infers that S
intended to convey Pi (ideally
Pi=P)
• Incorporation: H decides to
believe or disbelieve Pi

NLP - Prof. Carolina Ruiz


Cont’
Automated Natural language Processing usually engaged in
accommodation of rules and regulation of a particular language
both to understand and generate any language components
Morphology and Phonology
• Study of Words
– Their internal structure
washing wash -ing

– How they are formed?


bat bats rat rats
write writer browse browser

• Morphology tries to formulate rules


Syntax, semantics and Pragmatics
Has its own Syntax, Semantic and Pragmatic
interpretation
– Syntax: concerns the proper ordering of words
and its affect on meaning.
• The dog bit the boy.
• The boy bit the dog.
• ሶሎሞን እቃ ቤት ገባ
• እቃ ሰሎሞን ገባ ቤት
– Semantics: concerns the (literal) meaning of
words phrases, and sentences.
Cont’
• “plant” as a photosynthetic organism
• “plant” as a manufacturing facility
• “plant” as the act of putting
• “ዘነበ” ግስ ዝናቡ ዘነበ
– Pragmatics: concerns the overall communicative and social
context and its effect on interpretation. It needs detail
discourse analysis of communication. Usually deal with
• Social use of language
• The study of how language is used to accomplish goals, and
the influence of context on meaning

Understanding the aspects of a language which
depends on situation and world knowledge
Cont’
– Discourse generally deals with linguistic unit larger
than simple statement
Disambiguation is a part of NLP Process
• Natural language is highly ambiguous and
must be disambiguated.
Ambiguity is Ubiquitous
Phonological ambiguity
Ambiguity is available in speech recognition.
• “youth in Asia” vs. “euthanasia”
• “recognize speech” vs. “wreck a nice beach”

– Syntactical ambiguity.
• I saw a man with a glass
– Semantically ambiguity
– “ዘነበ” ስም
Cont’
Ambiguity is Explosive
– Ambiguities compound to generate enormous
numbers of possible interpretations.
– In English, a sentence ending in n prepositional
phrases has over 2n syntactic interpretations (what
about Amharic).
– I saw the man with the telescope” 2 parses
– I saw the man on the hill with the telescope: 5 parses
– I saw the man on the hill in Texas with the telescope” 14 parses
– I saw the man on the hill in Texas with the telescope at noon.”: 42
parses
– I saw the man on the hill in Texas with the telescope at noon on
Monday” 132 parses
Cont’
• Humor and Ambiguity: Many jokes rely on the
ambiguity of language:
• E.g.
– Policeman to little boy: “We are looking for a thief
with a bicycle.” Little boy: “Wouldn’t you be better
using your eyes.”
– የአማርኛ ቅኔ ሰምና ወርቅ
Natural Language Processing objectives

1. Understand languages
 Taking some spoken/typed sentence and working
out what it means

2. Generate languages
 Taking some formal representation of what you
want to say and working out a way to express it
in a natural (human) language (e.g., English)

NLP - Prof. Carolina Ruiz


‘Cont

• Natural Language Vs Artificial Language


– ዝናቡ መጣ
– ፀሃይ ወጣች
– ዝናቡ ሲሄድ ፀሃይ ወጣች
– Plant as a photosynthetic organism
– plant as a manufacturing facility
– plant as the act of sowing
Cont’
– Ambiguity is the primary difference between
natural and computer languages.
– Formal programming languages are designed to
be unambiguous, i.e. they can be defined by a
grammar that produces a unique parse for each
sentence in the language.
Cont
• What is NLP?
– The term Natural Language Processing(NLP)
encompasses a broad set of techniques for automated
generation, manipulation and analysis of natural or
human languages.
– NLP focused on developing systems that allow
computers to communicate with people with human
using everyday language.
– Its evolves in research agenda with a question of how
computational methods can aid the understanding of
human language.
Basic Terminology in NLP

• Token: linguistic units such as words, punctuation, numbers or


alphanumeric are known as tokens.
• Sentence: An ordered sequence of tokens.
• Corpus: A body of text, usually containing a large number of
sentences.
• Part-of-speech (POS) Tag: A word lexical categories such as
Nouns, Verbs, Adjectives and Articles within certain language
structure. A POS tag is a symbol representing such a lexical
category - NN(Noun), VB(Verb), JJ(Adjective), AT(Article).
cont’
• Parse Tree: A tree defined over a given sentence that
represents the syntactic structure of the sentence as
defined by a formal grammar.
Terminology in NLP tasks

• Tokenization: The process of splitting a sentence into its


constituent tokens.
– For segmented languages such as English, the
existence of whitespace makes tokenization easier.
However, for languages such as Chinese and Arabic,
the task is difficult since there are no explicit
boundaries between units.
– Sub processes
• Section Splitting: Splitting a text into sections
• Sentence Splitting: Splitting a text into sentences
• Word splitting: splitting a text in to word
Cont’
POS Tagging: Given a sentence and a set of POS
tags, a common language processing task is to
automatically assign POS tags to each word in
the sentences. For example, given the
sentence The ball is red, the output of a POS
tagger would be The/AT ball/NN is/VB red/JJ.
Cont’
Morphological analysis:
• Morphology is concerned with the discovery
and analysis of the internal structure of words
known as morphemes (or stems)
• Stems are the smallest linguistic units
possessing meaning.
Cont’
• Parsing: In the parsing task, a parser
constructs the parse tree given a sentence.
Parsers techniques may be generating using
grammar rules, generating using complex
statistical models, through labeling using
supervised learning.
• Parsing: Building the syntactic tree of a sentence
cont
• Named-entity recognition: Identifying pre-defined
entity types in a sentence
• Word sense disambiguation: Figuring out the exact
meaning of a word or entity.
Cont’
• Semantic role labeling: Extracting subject-
predicate-object triples from a sentence
Possible task involved in NLP
Linguistic Knowledge needed for NLP
• Phonetics and phonology: The study of
linguistic sounds and their relations to words.
• Morphology: The study of internal structures
of words and how they can be modified
Parsing complex words into their components
Cont’
• Syntax: The study of the structural relationships
between words in a sentence.
• Semantics: The study of the meaning of words,
and how these combine to form the meanings of
sentences
– – Synonymy: fall & autumn
– – Hypernymy & hyponymy (is a): animal & dog
– – Meronymy (part of): finger & hand
– – Homonymy: fall (verb & season)
– – Antonym: big & small
Cont’
• Pragmatics: Social use of language. The study
of how language is used to accomplish goals,
and the influence of context on meaning
Understanding the aspects of a language which
depends on situation and world knowledge.
• Discourse: The study of linguistic units larger
than a single statement
Challenges
• Normalization: Different words/sentences
express the same meaning. Preparation of
look up tables. Example
– Season of the year
• Fall
• Autumn
– Book delivery time
• When will my book arrive?
• When will I receive my book?
• Coming with better Disambiguate process
and techniques
– Phonetics and Phonology disambiguate
– Syntax disambiguate
– Semantics disambiguate
– Discourse analysis
Application of Natural Language Processing

• Spell and Grammar Checking and Correction


– Checking spelling and grammar
– Suggesting alternatives for the errors
• Word Prediction and suggestion (auto fill)
– Predicting the next word that is highly probable to
be typed by the user.
• Information Retrieval

Finding relevant information to the user’s query
Cont’
• Text Categorization: Assigning one (or more)
predefined category to a text
• Text Summarization :Generating a short summary
from one or more documents, sometimes based on
a given query.
• Question answering : Answering questions with a
short answer
• Information Extraction : Extracting important
concepts from texts and assigning them to slot in a
certain template
Cont’
• Machine Translation: Translating a text from
one language to another.
• Sentiment Analysis : Identifying sentiments
and opinions stated in a text.
• Optical Character Recognition: Recognizing
printed or handwritten texts and converting
them to computer-readable texts.
• Speech recognition :Recognizing a spoken
language and transforming it into a text
Cont’
• Speech synthesis : Producing a spoken
language from a text.
• Spoken dialog systems :Running a dialog
between the user and the system

You might also like