[go: up one dir, main page]

0% found this document useful (0 votes)
190 views30 pages

Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis

The document discusses natural language processing (NLP) and its various techniques. It covers morphological analysis, syntactic analysis, semantic analysis, and applications of NLP like information retrieval, question answering systems, and machine translation. It then describes issues in NLP like ambiguity and language variability. Finally, it provides details on morphological analysis including the different types of affixes, and syntactic analysis including top-down and bottom-up parsing.

Uploaded by

Dhoni virat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
190 views30 pages

Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis

The document discusses natural language processing (NLP) and its various techniques. It covers morphological analysis, syntactic analysis, semantic analysis, and applications of NLP like information retrieval, question answering systems, and machine translation. It then describes issues in NLP like ambiguity and language variability. Finally, it provides details on morphological analysis including the different types of affixes, and syntactic analysis including top-down and bottom-up parsing.

Uploaded by

Dhoni virat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT V INTELLIGENCE AND APPLICATIONS

Natural language processing-Morphological Analysis-Syntax analysis-Semantic Analysis- ALL AI


applications – Language Models – Information Retrieval – Information Extraction – Machine Translation –
Machine Learning – Symbol-Based – Machine Learning: Connectionist – Machine Learning.

NATURAL LANGUAGE PROCESSING


Natural Language: Natural languages are languages used by human for communication. Example: Hindi,
Tamil etc,
Natural Language Processing: NLP deals with processing, understanding and producing human languages
by machine.
Need of processing human language: To develop different Human Computer Interactive (HCI) system
through natural language.
Applications of NLP:
 Information Retrieval system
 Text Summarization
 Speech Recognition
 Text to Speech
 Machine Translation System
 Question answering system

Issues and processing complexities of NLP


1. Ambiguity: Words in any natural languages usually have a number of different possible meanings.
Example bank financial Bank, River Bank, Blood Bank
For many NLP task, the proper sense of each ambiguous words in a sentence must be determine to
interpret correct meaning.
Example: He saw (the boy) with a telescope.
He saw (the boy with a telescope).
2. Language variability: There are various ways to express the meaning of a word. A large number of
languages are available worldwide and most of the languages have different character set, structure
and grammar rules. It is difficult to design a language processing model that can capture a variety of
language.
Example: 22 official Indian Language
3 different writing in Germany
3. Difficult to incorporate human cognition

Morphological Analysis/Lexical Analysis

Syntactic Analysis (Parsing)

Sematic Analysis

Discourse Analysis

Pragmatic Analysis
Overview of NLP
NLP considers the hierarchical structure of a language where text represents set of symbols, several words
make phrases, several phrases makes sentences that conveys the idea and several sentences makes a paragraph
that conveys some context.
There are two main components of NLP
(i) Natural language understanding
It involves mapping given input in natural language to useful representation.
(ii) Natural Language generation
It involves producing meaningful phrases and sentences to convey some internal representation

Phases of NLP
1. Morphological Analysis/Lexical Analysis
It involves identifying and analyzing the structure of words. During lexical analysis the whole text is
divided in to paragraphs, sentences and words.
2. Syntactic Analysis (Parsing)
It involves analysis of words in the sentence for grammar and arranging words in a manner that shows
the relationship among them.
Example: The school goes to boy {Rejected by syntactic analysis}
3. Sematic Analysis
It involves identifying the exact dictionary meaning from text. That is it involves meaningful formation
of sentences.
Example: “hot ice cream” (Not Correct)
4. Discourse Analysis
The meaning of any sentence may depend on the meaning of the sentence that precedes it and may
also influence the meaning of the sentence that follow it,
Example: She wanted it. (Not clear unless we are aware of the previous sentence)
5. Pragmatic Analysis
The sentences representing what it said is reinterpreted to determine what was actually meant. It
involve deriving aspects of language which require real world knowledge. Example: Do you know
who am I? (Machine may not understand the expression behind the sentence)

MORPHOLOGICAL ANALYSIS

Morphology is the study of the structure and formation of words. Its most important unit is the morpheme,
which is defined as the "minimal unit of meaning". Consider a word like: "unhappiness". This has three parts:

The goal of morphological parser is to discover the morpheme that behinds a given word.
cats cat+s
cat is the root or stem word
s is the plural

Morphological analysis and parsing are essential for many NLP applications such as spelling error detection,
machine translation and information retrieval etc., There are two broad classes of morpheme
Stem: The main morpheme that contains the real meaning
Affixes: Modifies the meaning given by the stem

Affixes can be of 4 types

i) Prefix: morphemes appear before a stem


ii) Suffix: morphemes applied to the end of the stem
iii) Infix: morpheme that appear in between the stem
iv) Circumfix: morphemes applied before and after a stem

There are three different ways to form words

1. Inflection
Inflection is the process of changing the form of a word so that it expresses information such as number,
person, case, gender, tense, mood and aspect, but the syntactic category of the word remains unchanged. As
an example, the plural form of the noun in English is usually formed from the singular form by adding an s.

 car / cars
 table / tables
 dog / dogs

In each of these cases, the syntactic category of the word remains unchanged.

2. Derivation
Inflection doesn't change the syntactic category of a word. Derivation does change the category. Combines
word stem with a grammatical morpheme resulting a word belonging to a different class.
Example:
Teacher teaching
(noun)(verb)

3. Compounding
Process of merging two or more words to form a new word.
Example: overlook, desktop, download
A morphological parser uses the following information sources
i. Lexicon: A lexicon list contains stems and affixes together with basic information about them.
ii. Morphotactics: It includes the allowable ordering of morphemes that constitutes a word.
iii. Orthographic rules: These are specifies the spelling rules that specifies the changes that occur
when two given morphemes are combined.
Example: y ier
Easy easier
iv. Stemmer: Example: rotationalrotate
rule: ationalate
Morphological analysis can be avoided if an existing lexicon is available that list features for all
word forms of all root words in a language. However, is isn’t practical to list all possible
combination of word forms in many languages. The simplest morphological systems are stemmers
that converts morphological variations of a given word to its stem. Stemmers does not requires
lexicon instead it uses a set of rewrite rules to derive the root word.
ieny
inge

Stemming algorithm works in two steps


1. Suffix removal: Removes pre-defined endings from words
2. Recording: Add pre-defined endings to the output of the 1st step.
This model is implemented with special type of finite state automata called finite state transducer. Transducer
maps a set of symbols to another. To replace cat by bat
c:b a:a t:t
q0 q1 q2 q3

SYNTAX ANALYSIS
Syntax analysis checks the text for meaningfulness comparing to the rules of formal grammar. The
objective of syntactic analysis is to find the syntactic structures of the sentence that determines the meaning of
the sentence. It can be represented as a tree where nodes represent phrases, leaf nodes are words. For example,
the sentence like “hot ice-cream” would be rejected by semantic analyzer.
NP- Noun Phrase
NP is a phrase whose head is noun or pronoun optionally accomplished by a set of modifiers.
NP Noun
NP Pronoun
NP Det Noun
NP Adj Noun
NP Det Adj Noun
VP- Verb Phrase
VP can simply a verb attached to a noun phrase or proportional phrases etc.
VP Verb
VP Verb NP
VP Verb PP
Sentence
S VP
S NP VP

Wh question Structure
SWh NP VP
SWh NP Aux NP VP
Yes or No Question
SAux NP VP

Types of Parsing
Derivation divides parsing into the followings two types −
 Top-down Parsing
 Bottom-up Parsing

TOP-DOWN PARSING
In this kind of parsing, the parser starts constructing the parse tree from the start symbol and then tries to
transform the start symbol to the input. The most common form of top down parsing uses recursive procedure
to process the input. Top down parsing starts its search from the root node and works downwards towards
the leaf node. The root is expanded using the grammar with ’S’ as non-terminal symbols. Each non-terminal
symbol in the resulting sub-tree is then expanded using the appropriate grammar rules.

SNP VP NPNoun Prepositionfrom|with|on|to


SVP VPVerb Verbplays|point
NP Det Noun VPVerb Noun NounStudent|..
NP pronoun PP Prep NP
NP Det Noun PP DetThis|That|the
Pronoun She|He|they
Example 1: Paint the door
Level 1: S
Level 2:

S
S

NP VP
VP

Level 3:

S
S S
NP VP
NP VP NP VP
Det Noun
Pronoun
Det Noun PP

S S

VP VP

Verb Verb NP

Level 4:

VP

Verb NP
(paint)

Det Noun
(the) (door)
Example 2: Deepika reads the book

NP VP

Noun Verb NP
(Deepika (reads)
)
Det Noun
(the) (book)

Example 3: Does this flight includes a meal?


S

Aux NP VP
(Does)

Det Noun Verb NP


(this) (flight) (include)

Det Noun
(a) (meal)
Advantages
There is a chance of generating wrong grammatical sentence as it starts generating the tree using the start
symbol of grammar.
Disadvantages
Time consuming because it checks each and every rule of parsing

BOTTOM-UP PARSING
In this kind of parsing, the parser starts with the input symbol and tries to construct the parse tree in an upward
direction towards the root. At each step the parser checks the rules in the grammar where the RHS matches
the portion of the parse tree constructed so far. It then reduces it using the LHS of the production. The parse
tree is considered to be successful if the parser reduces the tree to the start symbol of the grammar.
Example 1: Paint the door

Paint the door

Verb Det Noun

NP

VP

S
Example 2: Deepika reads the book

Deepika reads the book


Noun Verb Det Noun

NP
NP

VP

Example 3: Does this flight includes a meal? S

Does this flight includes a meal?


Aux Det Noun Verb Det Noun

NP NP

VP

S
Advantages
It never wastes time in exploring a tree that does not match the input.
Disadvantages
It wastes time in generating trees that have no chance of leading to S rooted tree.

Disadvantages of parsing
 Left recursion (leads to infinite loop)
 Ambiguity can’t be resolved

Using the telescope, the boy saw the man The boy saw the man. The boy had a telescope

 Repeated parsing
Solution
Dynamic programming algorithm can solve this problem by constructing a table containing solution to sub-
problems
 Earley Parser (Top down)
 CYK Parser (Bottom Up)
CKY ALGORITHM ( Cocke–Younger–Kasami algorithm )
It is one of the earliest recognition and parsing algorithms. The standard version of CKY can only
recognize languages defined by context-free grammars in Chomsky Normal Form (CNF). It is also possible to
extend the CKY algorithm to handle some grammars which are not in CNF
Observation: any portion of the input string spanning i to j can be split at k, and structure can then be built
using sub-solutions spanning i to k and sub-solutions spanning k to j .

VP

S
VP PP
S NP NP

NP V, VP Det. N P Det N

she eats a fish with a fork

SEMANTIC ANALYSIS
It involves mapping of natural language occurrences to some representation of meaning. A meaning
representation language bridges the gap between linguistic and common sense knowledge.
Example:
A cup on the table on( cup, table)
Air India serves Hyderabad serves(Air India, Hyderabad)
Example of meaning representation language are
 First Order Predicate Calculus
 Semantic Network
 Conceptual Dependencies
Characteristics of Meaning representation language
1. Verifiable: We should be able to identify the meaning and also verify the meaning.
2. Unambiguous: There should not be any ambiguity. There should be only one representation
of the sentence.
3. Support canonical form
Does Air India serves Hyderabad?- serves(x,y)
Does Air India offers a flight to Hyderabad?- offers(x,y)
Does Air India have a flight to Hyderabad?- have (x,y)
4. Support inference and variables
5. Expressiveness: A wide range of domain must be served.
Meaning representation can be created and assigned to linguistic units using two approaches
(i). Syntax driven semantic analysis
(ii) Semantic Analysis
(i) Syntax driven semantic analysis
It uses syntactic constituents of a sentence to build its meaning representation.
Example: President nominates speaker

NP VP

Noun Verb NP
(President) (nominates)

Noun
(speaker)
It is based on the “principle of compositionality” that states that meaning of whole sentence can be
derived from the meaning of its parts. In reality the actual meaning of sentence can be derived from constituent
meaning, relationship between words, context, domain, word order and real world knowledge.
The old man finally kicked the bucket- The man finally died (idiom)
All roads lead to Rome- There are many ways to do a task. (idiom)
Disadvantages
It depends upon traditional grammar rules which aren’t meant for semantic processing. It only provides the
abstract information which many times does not fit the semantic meaning.
(ii) Semantic Grammar
Example: I want to go from Delhi to Chennai on 24th March.
Info request: User wants to go to city from city Time Expr
Drawback
o It is domain dependent. Whenever we are adding new domain we have to write new rule.
o It focusses on building semantic capabilities in to grammar
o Here, rules and constituents corresponds directly to entities and activities in the domain
o Rules are domain specific, therefore it is required to develop new rules for each domain
Example: Indian Hotel and Chinese Food We need to write different rules for this phase because they belong
to two different context.
Applications of semantic analysis
 Information retrieval
 Machine translation
 Question answering system
 Sentimental analysis
 Text classification
 Text analysis
Importance of semantic analysis
It is concerned with understanding the actual meaning by extracting contextual information.
Example: Feeling hungry nearby restaurants

Word net: It is a large lexical data base for English language that list all senses of a word.
Relationships are 4 types namely:
Homonymy: words with different meaning
Polysemy: Many possible meaning for a word
Antonymy: opposite of a word
Synonymy: Many word for single meaning

LANGUAGE MODELS

A Language Model is a probabilistic model which predicts the probability that a sequence of tokens belongs
to a language.

How Language Models helps in NLP Tasks

The probabilities returned by a language model are mostly useful to compare the likelihood that different
sentences are "good sentences".
Spell checking, Automatic Speech Recognition, Machine Translation:

N-grams
o N-grams are the simplest tool available to construct a language model.
o An N-gram is a sequence of N words.
o An N-gram model predicts the probability of a given N-gram within any sequence of words
in the language.
P(s) = P(w1,w2,w3……..wn)
The chain rule of probability is an exact rewriting of a joint-
probability:

p(w1...wn) = p(w1) . p(w2 | w1) . p(w3 | w1 w2) . p(w4 | w1 w2 w3) ..... p(wn | w1...wn-1)
P(w1w2…wn ) = ∏ P(wi | w1w2…wn−1)
i

Bigram-is the sequence of two words


Trigram- is the sequence of three words
Estimating probability

p(wn | w1...wn-1)= C(w1...wn-1)


C(wn-1)

Names for models


n=1 unigram: p(wi)
n=2 bigram: p(wi |wi−1) (Markov process)
n=3 trigram:
Problem 1:
Find the probability of test sentence given below if a bigram model is used on the following training
sentences
Training data:
1. The Arabian Knights
2. These are the fairy tales of the eat
3. The Stories of the Arabian Knights are translated primary languages.
Test Data:
The Arabian Knights are the fairy tales of the eat
Bigram Model Solution
Estimating bigram probabilities

• The Maximum Likelihood Estimate


P(wi | wi−1) = c(wi−1,wi )
c(wi−1)

1.The Arabian Knights 2.These are the fairy tales of 3. The Stories of the Arabian
the eat Knights are translated primary
languages.

P(The|<s>)=2/3=0.66 P(these|<s>)=1/3=0.333 P(the|<s>)=2/3=0.66


P(Arabian|The)=2/5=0.4 P(are|these)=1/1=1 P(Stories|the)=1/5=0.2
P(Knight|Arabian)=2/2=1 P(the|are)=1/2=0.5 P(of|Stories)=1/1=1
P(fairly|the)=1/5=0.2 P(the|of)=1/1=1
P(tales|Fairly)=1/1=1 P(Arabian|The)=2/5=0.4
P(of|tales)=1/1=1 P(Knights|Arabian)=2/2=1
P(the|of)=1/1=1 P(are|Knights)=1/2=0.5
P(eat|the)=1/5=0.2 P(translated|are)=1/2=0.5
P(Primary|translated)=1/1=1
P(languages|Primary)=1/1=1
Test Solution
The Arabian Knights are the fairy tales of the eat
=P(the|<s>)* P(Arabian| The)* P(Knight | Arabian)* P(are|Knights)* P(the|are)* P(fairy|the)* P(tales|Fairy)*
P(of|tales)* P(the|of)* P(eat|the)
=0.66*0.4*1*0.5*0.5*0.2*1*1*1*0.2
=0.00268
Problem 2:
Training set
6. I am Sam
7. Sam I am
8. I am not Sam
Testing set
1. I am not Sam
Solution
Bi-gram
=P(I<s>)*P(am/I)*P(not/am)*P(Sam/not)
=(2/3)*(3/3)*(1/3)*(1/1)
=0.221
Tri-gram
=P(I/<s1><s2>)*P(am/I<s1>)*P(not/I am)*P(Sam/am not)
=(2/3)*(2/2)*(1/3)*(1/1)
0.2211

In a unigram model, the probability of each word depends on its own probability in the corpus.
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑊𝑛 𝑎𝑝𝑝𝑒𝑎𝑟𝑠
P(wn)=𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑜𝑟𝑝𝑢𝑠
If total number of words in the corpus is equal to 10000 and ‘the’ appears 7000 times, then
prob(the)=7000/10000=0.7=70%

Data Sparseness Issue in N gram model:

‘zero probability’ one n gram cause a zero probability of the entire sentence. It can be avoided by
“Smoothing n-gram models”.

INFORMATION RETRIEVAL
Information retrieval is the task of documents that are relevant to a user’s need for information. The best-
known examples of information retrieval systems are search engines on the World Wide Web.

List of documents
User User Query Query IR System related to document
Processing
query

Document
Repository

A user requiring any information formulates a request in the form of query written in Natural Language. The
retrieval system responds by retrieving the document that seems to be relevant. Information retrieval deals
with organization, storage, retrieval and evaluation of information related to users query.
An IR model involves the following activities.
 Document Indexing
 Query Processing
 Document matching and Ranking

1. Document Indexing
In IR system the actual text of the document isn’t used in the retrieval process. Documents are represented
through a set of index terms or keywords. These index terms are used for accessing the required documents.
Document indexing can be done automatically by using the context used in the document or it can be done
manually where the set of keywords are defined by the creator of the document.
2. Query processing
It involves finding the keywords or index terms in the user query which are used for retrieval process. The user
entered queries are processed to remove the words which does not constitute to the retrieval process.
 Stop word removal: Stop words are high frequency word which have less semantic weight and are
thus not helpful in the retrieval process. These words play important grammatical roles in languages
such as in the formation of phrases but does not contribute to the semantic context of a document in a
keyword based representation. Eliminating stop word considerably reduces the number of index terms.
Example: an, a , the, form, also, for
 Stemming: Stemming focusses on removing the affixes.
Example: programs, programming, programmer
 Lemmatization: Interested to find the root word present in the dictionary.
3. Document Matching and Ranking
It involves finding the relevant documents from the document repository that may contain the answer to the
users query. It also rank them as per its relevance to the users query.

An information retrieval system can be characterized by


1. A corpus of documents. Each system must decide what it wants to treat as a document: a paragraph, a page,
or a multipage text.
2. Queries posed in a query language. A query specifies what the user wants to know. The query language
can be just a list of words, such as [AI book]; or it can specify a phrase of words that must be adjacent, as in
[“AI book”]; it can contain Boolean operators as in [AI AND book]; it can include non-Boolean operators
such as [AI NEAR book] or [AI book site:www.aaai.org].
3. A result set. This is the subset of documents that the IR system judges to be relevant to the query. By
relevant, we mean likely to be of use to the person who posed the query, for the particular information need
expressed in the query.
4. A presentation of the result set. This can be as simple as a ranked list of document titles or as complex as
a rotating color map of the result set projected onto a three-dimensional space, rendered as a two-dimensional
display.
The earliest IR systems worked on a Boolean keyword model. Each word in the document collection is treated
as a Boolean feature that is true of a document if the word occurs in the document and false if it does not.
Advantage of Boolean keyword model
1. Simple to explain and implement.
Disadvantages of Boolean keyword model
1. The degree of relevance of a document is a single bit, so there is no guidance as to how to order the
relevant documents for presentation.
2. Boolean expressions are unfamiliar to users who are not programmers or logicians.
3. It can be hard to formulate an appropriate query, even for a skilled user.

IR scoring functions
A scoring function takes a document and a query and returns a numeric score; the most relevant documents
have the highest scores. In the BM25 function, the score is a linear weighted combination of scores for each
of the words that make up the query.
Three factors affect the weight of a query term:
1. First, the frequency with which a query term appears in a document. For the query [farming in Kansas],
documents that mention “farming” frequently will have higher scores.
2. Second, the inverse document frequency of the term, or IDF. The word “in” appears in almost every
document, so it has a high document frequency, and thus a low inverse document frequency, and thus
it is not as important to the query as “farming” or “Kansas.”
3. Third, the length of the document. A million-word document will probably mention all the query words,
but may not actually be about the query. A short document that mentions all the words is a much better
candidate.
The BM25 function takes all three of these into account. TF(qi, dj), the count of the number of times word qi
appears in document dj. DF(qi), that gives the number of documents that contain the word qi.

where, |dj | is the length of document dj in words, and L is the average document length in the corpus: L =
|𝑑 |
∑𝑖 𝑖 |. The two parameters, k and b, that can be tuned by cross-validation;
𝑁
It would be impractical to apply the BM25 scoring function to every document in the corpus. Instead,
systems create an index ahead of time that lists, for each vocabulary, the documents that contain the word. This
is called the hit list for the word. Then when given a query, we intersect the hit lists of the query words and
only score the documents in the intersection.
Evaluation of IR system
Imagine that an IR system has returned a result set for a single query, for which we know which documents
are and are not relevant, out of a corpus of 100 documents.
In result set Not in result set
Relevant 30 20
Not-relevant 10 40
Evaluation of IR system can be done on the following points
 Time between submission of query and getting the response.
 Presentation format
 User effort to get the information
Precision measures the proportion of documents in the result set that are actually relevant.
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑑
Precision= 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 =30/(30+10)= 0.75

Recall measures the proportion of all the relevant documents in the collection that are in the result set.
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑟𝑒𝑡𝑟𝑖𝑒𝑣𝑑
Recall= =30/(30+20)=0.60
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑖𝑜𝑛
In a very large document collection, such as the World Wide Web, recall is difficult to compute, because there
is no easy way to examine every page on the Web for relevance.

Page Rank Algorithm


Page rank is the algorithm used by Google search to rank the web pages in their search engine results.
Page rank is a way of measuring the importance of website pages. Page rank works by counting the number
and quality of links to a page to determine a rough estimate of how important the website is. The underlying
assumption is that more important websites are likely to receive more links form other websites.
The PageRank for a page p is defined as:

where PR(p) is the PageRank of page p, N is the total number of pages in the corpus, ini are the pages that link
in to p, and C(ini) is the count of the total number of out-links on page ini. The constant d is a damping factor.

The HITS algorithm


The Hyperlink-Induced Topic Search algorithm, also known as “Hubs and Authorities” or HITS, is
another influential link-analysis algorithm. HITS differs from PageRank in several ways. First, it is a query-
dependent measure: it rates pages with respect to a query.

The HITS algorithm for computing hubs and authorities with respect to a query. RELEVANT-PAGES fetches
the pages that match the query, and EXPAND-PAGES adds in every page that links to or is linked from one
of the relevant pages. NORMALIZE divides each page’s score by the sum of the squares of all pages’ scores.

Given a query, HITS first finds a set of pages that are relevant to the query. It does that by intersecting
hit lists of query words, and then adding pages in the link neighborhood of these pages—pages that link to or
are linked from one of the pages in the original relevant set. Each page in this set is considered an authority on
the query to the degree that other pages in the relevant set point to it. A page is considered a hub to the degree
that it points to other authoritative pages in the relevant set. With this PageRank, we don’t want to merely
count the number of links; we want to give more value to the high-quality hubs and authorities. Thus, as with
PageRank, we iterate a process that updates the authority score of a page to be the sum of the hub scores of the
pages that point to it, and the hub score to be the sum of the authority scores of the pages it points to. If we
then normalize the scores and repeat k times, the process will converge. Both PageRank and HITS played
important roles in developing our understanding of Web information retrieval.

INFORMATION EXTRACTION
Information extraction is the process of acquiring knowledge by skimming a text and looking for
occurrences of a particular class of object and for relationships among objects. A typical task is to extract
instances of addresses from Web pages, with database fields for street, city, state, and zip code; or instances
of storms from weather reports, with fields for temperature, wind speed, and precipitation. In a limited domain,
this can be done with high accuracy.
1. Finite-state automata for information extraction
The simplest type of information extraction system is an attribute-based extraction system that assumes
that the entire text refers to a single object and the task is to extract attributes of that object.
For example, the problem of extracting from the text “IBM ThinkBook 970. Our price: $399.00” the
set of attributes {Manufacturer=IBM, Model=ThinkBook970, Price=$399.00}. We can address this problem
by defining a template (also known as a pattern) for each attribute we would like to extract. The template is
defined by a finite state automaton, the simplest example of which is the regular expression, REGULAR
EXPRESSION or regex.
Here we show how to build up a regular expression template for prices
[0-9] matches any digit from 0 to 9
[0-9]+ matches one or more digits
[.][0-9][0-9] matches a period followed by two digits
One step up from attribute-based extraction systems are relational extraction systems, RELATIONAL
EXTRACTION which deal with multiple objects and the relations among them. . Thus, when these systems
see the text “$249.99,” they need to determine not just that it is a price, but also which object has that price. A
relational extraction system can be built as a series of cascaded finite-state transducers. That is, the system
consists of a series of small, efficient finite-state automata (FSAs).
FASTUS consists of five stages:
1. Tokenization
2. Complex-word handling
3. Basic-group handling
4. Complex-phrase handling
5. Structure merging
FASTUS’s first stage is tokenization, which segments the stream of characters into tokens (words, numbers,
and punctuation). The second stage handles complex words, including collocations such as “set up” and “joint
venture”. For example, a company name might be recognized by the rule
CapitalizedWord+ (“Company” | “Co” | “Inc” | “Ltd”)
The third stage handles basic groups, meaning noun groups and verb groups. The idea is to chunk these into
units that will be managed by the later stages.
Example: Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern
and a Japanese trading house to produce golf clubs to be shipped to Japan.

The fourth stage combines the basic groups into complex phrases. Again, the aim is to have rules that
are finite-state and thus can be processed quickly, and that result in unambiguous output phrases. One type of
combination rule deals with domain-specific events. For example, the rule
Company+ SetUp JointVenture (“with” Company+)?
captures one way to describe the formation of a joint venture. The final stage merges structures that were built
up in the previous step.

In general, finite-state template-based information extraction works well for a restricted domain. Finite-state
information extraction is less successful at recovering information in highly variable format, such as text
written by humans on a variety of subjects.
2. Probabilistic models for information extraction
The simplest probabilistic model for sequences with hidden state is the hidden Markov model, or HMM.
For example, here is a brief text and the most probable (Viterbi) path for that text for two HMMs, one trained
to recognize the speaker in a talk announcement, and one trained to recognize dates. The “-” indicates a
background state:

HMMs have two big advantages over FSAs for extraction. First, HMMs are probabilistic, and thus tolerant
to noise. In a regular expression, if a single expected character is missing, the regex fails to match; with HMMs
there is graceful degradation with missing characters/words, and we get a probability indicating the degree of
match, not just a Boolean match/fail. Second, HMMs can be trained from data; they don’t require laborious
engineering of templates, and thus they can more easily be kept up to date as text changes over time.
3. Conditional random fields for information extraction
An HMM is a generative model; it models the full joint probability of observations and hidden states, and
thus can be used to generate samples. That is, we can use the HMM model not only to parse a text and recover
the speaker and date, but also to generate a random instance of a text containing a speaker and a date.
We can define a simple feature function, for example one that produces a value of 1 if the current word is
ANDREW and the current state is SPEAKER:

Parameter values can be set manually or can be learned from data. Now consider a second feature function:
This feature is true if the current state is SPEAKER and the next word is “said”. Features can also be
defined over transitions between states. The features we defined here were binary, but in general, a feature
function can be any real-valued function.
4. Ontology extraction from large corpora
A different application of extraction technology is building a large knowledge base or ontology of facts
from a corpus. Here is one of the most productive templates:
NP such as NP (, NP)* (,)? ((and | or) NP)? .
Here the bold words and commas must appear literally in the text, but the parentheses are for grouping,
the asterisk means repetition of zero or more, and the question mark means optional. NP is a variable standing
for a noun phrase;
This template matches the texts “diseases such as rabies affect your dog” and “supports network
protocols such as DNS,” concluding that rabies is a disease and DNS is a network protocol. Similar templates
can be constructed with the key words “including,” “especially,” and “or other.” Of course these templates
will fail to match many relevant passages, like “Rabies is a disease.” That is intentional. The “NP is a NP”
template does indeed sometimes denote a subcategory relation, but it often means something else, as in “There
is a God” or “She is a little tired.”
5. Automated template construction
It is possible to learn templates from a few examples, then use the templates to learn more examples, from
which more templates can be learned, and so on. . In one of the first experiments of this kind, Brin (1999)
started with a data set of just five examples:

(“Isaac Asimov”, “The Robots of Dawn”)


(“David Brin”, “Startide Rising”)
(“James Gleick”, “Chaos—Making a New Science”)
(“Charles Dickens”, “Great Expectations”)
(“William Shakespeare”, “The Comedy of Errors”)

Clearly these are examples of the author–title relation, but the learning system had no knowledge of
authors or titles. The words in these examples were used in a search over a Web corpus, resulting in 199
matches. Each match is defined as a tuple of seven strings,
(Author, Title, Order, Prefix, Middle, Postfix, URL),
where, Order is true if the author came first and false if the title came first, Middle is the characters between
the author and title, Prefix is the 10 characters before the match, Suffix is the 10 characters after the match,
and URL is the Web address where the match was made.

In the experiment run by Brin, the first 199 matches generated three templates. The most productive template
was
<L1><B> Title</B> by author {URL: www.sff.net/locus/c}
The examples were then used to generate more templates, and so on, eventually yielding over 15,000 titles.
Given a good set of templates, the system can collect a good set of examples. Given a good set of examples,
the system can build a good set of templates.
The biggest weakness in this approach is the sensitivity to noise. If one of the first few templates is
incorrect, errors can propagate quickly. One way to limit this problem is to not accept a new example unless
it is verified by multiple templates.
6. Machine reading
Machine reading system is an extraction system with no human input of any kind and it could read on its own
and build up its own database. Such a system would be relation-independent; would work for any relation. A
representative machine-reading system is TEXTRUNNER and it uses co-training to boost its performance, but
it needs something to bootstrap from. For TEXTRUNNER, the original inspiration was a taxonomy of eight
very general syntactic templates, as shown in Figure
Given a set of labeled examples of this type, TEXTRUNNER trains a linear-chain CRF to extract further
examples from unlabeled text. TEXTRUNNER achieves a precision of 88% and recall of 45% on a large Web
corpus. TEXTRUNNER has extracted hundreds of millions of facts from a corpus of a half-billion Web pages.

MACHINE TRANSLATION
Machine translation is a sub-field of computational linguistics that investigates the use of software to translate
text or speech from one language to another.
Applications
o Language interpretation
o Website translation
o Document translation
o Speech translation
o Mobile translation.
Issues: Due to structural variation of different languages, automatic translation is a difficult and challenging
task.
 Word order: arrangement of words varies across languages
Deepika reads a book
Subject Verb Object
தீபிகா புத்தகம் படித்தாள்
Subject Object Verb
 Word Sense: sense of a word in one language may be different from other language
Book a room
In Tamil Book means “புத்தகம்”. It leads to incorrect machine translation.
 Reference Resolution: Resolving references is important for machine translation. Unresolved
references may lead to incorrect translation.
Sudha forget her pend rive in lab.  Here, her refers Sudha
 Idioms: Old man finally kicked the bucket.
The actual meaning is “The old man finally died”. If we translate as such, it leads to incorrect machine
translation.
 Ambiguity: Same word have different meaning

Approaches to machine translation


1. Direct machine translation
2. Rule based Machine translation
3. Corpus based machine translation
4. Knowledge based machine translation
1. Direct machine translation
It carries out word by word translation with the help of a bilingual dictionary. Usually followed by some
machine syntactic rearrangements. SL- Source Language, TL- Target Language.

Source SL words Bilingual TL words


Language Morphological Dictionary Syntactic
Text Analysis look-up Rearrangements

SL –TL
Dictionary

In morphological analysis stage, the morphological inflexion from the words are removed to obtain the root
form of source language words.
Example: gave give
In bilingual dictionary look-up stage, the target language words corresponding to the source language words
are systematically retrieved from a bilingual dictionary.
Example: food in English
"உணவு" in Tamil
In syntactic rearrangement stage, the word order are changed to best match the word order of target language.
Example:
The boy plucked the flower. ( In English)
Subject Verb Object

சிறுவன் பூவவப் பறித்தான் . (In Tamil)


Subject Object Verb

Advantage
 Easy and fast.
Disadvantage
 It does not attempt to disambiguate words therefore the quality of the output is not very satisfactory.
 Word by word translation may make it difficult to get the correct meaning
 It does not consider the structure and relationship between words.
 It is developed for a specific language pair and cannot be adopted for a different pair of language.

2. Rule based Machine translation


It makes use of extensive lexicons equipped with morphological, syntactic, semantic information and a large
set of rules to map the input text to an intermediate representation. The intermediate form may be a parse tree
or some abstract representation. The target language text is generated from the intermediate representation.
Based on the intermediate representation used these systems are categorized as
(i) Transfer based machine translation
(ii) Interlingual machine translation

(i) Transfer based machine translation


It involves parsing the source language text and then transforming the parse structure in to target language
structure. During analysis phase, the source text is analyzed and a source language structure is produced
confirming the rules of source language. During the transfer phase, the SL representation is transformed in to
target language representation by using dictionary and grammar rules. In the synthesis phase, the TL text are
generated by using the target language grammar.
SL TL
representation Representation
SL Text Analysis Transfer Synthesis TL Text

SL Grammar TL Grammar
SL –TL Dictionary
and Grammar
Example:
The boy plucked the flower. ( In English)
Subject Verb Object
Paiyam Poovai Parithan. (In Tamil)
Subject Object Verb

Advantages
 The analysis of source language ids independent of target language
 It can handle ambiguities
 The same grammar rules can be used for analysis as well as generation
 It is more realistic, flexible and can be extended to language pairs in a multi-lingual environment

(ii) Interlingual machine translation


In interlingual based machine translation approach, the source language text is converted in to a language
independent meaning representation called interlingua. From the interlingua representation text are generated
in to other language. During the analysis phase, the source language text is converted in to interlingua
representation which is independent of language. During synthesis phase, the target language text is generated
by referencing the target language grammar rules.

Interlingua
SL1 Text Analysis representation Synthesis TL1 Text

SL1 Grammar TL1 Grammar

SLn Text Analysis Synthesis TLnn Text

SLn Grammar TLn Grammar

Advantages
 The same analysis component may be used for more than one target language.
Disadvantages
 Defining universal abstract/ interlingua representation which preserves the meaning of a sentence is
extremely difficult and is a challenging task.
3. Corpus based machine translation
Corpus based machine translation has two types
(i) Example based machine translation
(ii) Statistical based machine translation

(i) Example based machine translation


It uses past translation examples to generate the target representation for a given input. It maintains an
example based consisting of translation examples between source and target language.

Example
pattern
SL Retrieval Adaptation TL Sentence
Sentence

Example based Adaptation rules


SL –TL Dictionary
and Grammar
Advantages
 Easy to add new examples and enhance the model
 If there is proper or huge amount of examples, the accuracy is good
 It can support multi-lingual languages
Disadvantages
 Structurally same sentences may not have similar meaning therefore selecting a structurally same
example may misinterpret its meaning
(ii) Statistical based machine translation

Source Decoder Target


Language Language

This system is inspired by the noisy channel model. A noisy channel introduces noise that make it difficult to
recognize the input words. The recognition system is a model of the channel to recognize the original words.
The input is considered as the distorted function of the target language sentences. The task is to find out the
most likely source language sentence given the translation. We make the assumption that each phrase
translation and each distortion is independent of the others, and thus we can factor the expression as

The procedure of statistical based machine translation are


 Find parallel texts
 Segment into sentences
 Align sentences
 Align phrases
 Extract distortions
 Improve estimates with EM
4. Knowledge based machine translation
Knowledge based machine translation system uses the semantic information along with the syntactic
information. It requires a large knowledge base that includes lexical knowledge along with ontological
language which shows the relationship between concepts and categories in a subject area or domain.
SYMBOL BASED MACHINE LEARNING
Symbolic machine learning is the branch of artificial intelligence research that concerns itself with
attempting to explicitly represent human knowledge in a declarative form. It also explicitly represent the
domain knowledge.
Framework for Symbol Based Learning
i. Data and the goals of the learning task
ii. Representation Language
iii. Set of operations
iv. The concept space
v. Heuristic Search
vi. Acquired knowledge
(i) Data and Goals of Learning Task
Types of data
Types of data are positive or negative examples, single positive example and domain specific
knowledge, high level advice and analogies.
Goal of learning algorithm
The goals of learning algorithms are acquisition of concept, general description of a class of objects,
acquisition of plans, acquisition of problem solving heuristics and acquisition of other forms of procedural
knowledge.
Properties and quality of data
The quality of data should be reliable or contain noise, well-structured or unorganized and
positive and negative or only positive. The quality of data should also come from the outside environment or
generated by the program itself.

CONCEPT LEARNING EXPLANATION CLUSTERING


BASED
DATA Positive/negative A training example A set of unclassified
examples of a target class +prior Knowledge instances
GOAL To infer a general To infer a general To discover
definition concept categorization

(ii) Representation of learned knowledge


 Concept expression in predicate calculus
A simple formulation of the concept learning problem as conjunctive sentences containing
variables
size(obj1, small) color(obj1, red) shape(obj1, round)
size(obj2, large) color(obj2, red) shape(obj2, round)
 size(X,Y) color(X, red) shape(X, round)
 Structure representation such as frames
 Description of plans as a sequence operation or triangle table
 Representation of heuristic as problem solving rules.
(iii) A Set of operations
Given a set of training instances, the leaner must construct a generalization, heuristic rule, or plan that
satisfies its goal. It requires ability to manipulate representations. Typical operations include
 generalizing or specializing symbolic expressions
 adjusting the weights in a neural network
 modifying the program’s representations
(iv) Concept space
Concept space defines a space of potential concept definitions. The complexity of potential concept space
is a measure of difficulty of learning algorithms.

(v) Heuristic Search


Heuristic search uses the available training data and search efficiently. Patrick Winston’s work on learning
concepts from positive and negative examples along with near misses.

Concept Near Miss

The program learns by refining candidate description of the target concept through generalization and
specialization.
 Generalization changes the candidate description to let it accommodate new positive examples

(a) An example of arch and its network descriptions

(b) Given background knowledge that bricks and pyramids are both polygons
 Specialization changes the candidate description to exclude near misses

 Performance of learning algorithm is highly sensitive to the quality and order of the training examples

Version Search Space


(i)Concept Space
Defines a space of potential concept definitions. Implementation of inductive learning as search through a
concept space. Generalization operations impose an ordering on the concepts in a space, and uses this ordering
to guide the search. Generalization Operators and Concept Space
(i) Replacing constants with variables
color(ball, red)  color(X, red)
(ii) Dropping conditions from a conjunctive expression
shape(X, round)  size(X, small) color(X, red)  shape(X, round)  color(X, red)
(iii) Adding a disjunct to an expression
shape(X, round) size(X, small)  color(X, red)  shape(X, round)  size(X, small)  (color(X, red)
∨ color(X, blue))
(iv)Replacing a property with its parent in a class hierarchy
color(X, red)  color(X, primary_color) if primary_color is superclass of red

(ii) Candidate Elimination Algorithm


Version Space: It is the set of all concept descriptions consistent with the training examples. Toward reducing
the size of the version space as more examples become available.
 Specific to general search from positive examples
 General to specific search from negative examples
 Candidate elimination algorithm combines these into a bidirectional search
The candidate elimination algorithm combines the two directions of search into a single algorithm has
several benefits. Figure gives an abstract description of the candidate elimination algorithm.
“+” signs represent positive instances
“-” signs indicate negative instances
The search “shrinks” the outermost concept to exclude negative instances
The search “expands” the innermost concept to include new positive instances
Candidate elimination Algorithm

Drawbacks of candidate ellimination algorithm


 The algorithm may fail to converge because of noise or inconsistency in training data
 Excessive growth of search space

MACHINE LEARNING CONNECTIONIST


Neural networks, also called connectionist or parallel distributed processing, are set up based on one
of the simple realistic brain models. They consist of a large number of connected simple computational units
(like neurons). Each unit examines its inputs and calculates a result activation which is transferred to other
units. Each connection has a signed number weight that can determine if an activation which travels along it
will affect the receiving unit to do the same calculation. The size of the weight determines the magnitude of
the influence of a sending unit's activation upon the receiving cell. Therefore the output of neural networks is
determined by the connections and their weight of the model.
Foundation of Connection Network
All connectionist models use a similar model of a neuron. There is a collection of units each of which has
• A number of weighted inputs from other units
• A threshold that the sum of the weighted inputs are compared against
• A single output to another bunch of units
A unit (perceptron)

xi are inputs,
wi are weights,
wn is usually set for the threshold with xn =-1 (bias),
y is the weighted sum of inputs including the threshold (activation level)
o is the output. The output is computed using a function that determines how far the perceptron’s activation
level is below or above 0

Training perceptrons
We can train perceptrons to compute the function of our choice
• Start with a perceptron with any values for the weights (usually 0)
• Feed the input, let the perceptron compute the answer
• If the answer is right, do nothing
• If the answer is wrong, then modify the weights by adding or subtracting the input vector (perhaps
scaled down)
• Iterate over all the input vectors, repeating as necessary, until the perceptron learns what we want
Perceptron Learning
Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic
operational unit of artificial neural networks. It employs supervised learning rule and is able to classify the
data into two classes.
Perceptron thus has the following three basic elements –
Links−It would have a set of connection links, which carries a weight including a bias always having weight 1
Adder − It adds the input after they are multiplied with their respective weights.
Activation function − It limits the output of neuron. The most basic activation function is a Heaviside step
function that has two possible outputs. This function returns 1, if the input is positive, and 0 for any negative
input.
Perceptron is a function that maps its input “x,” which is multiplied with the learned weight coefficient; an
output value ”f(x)”is generated.

“w” = vector of real-valued weights


“b” = bias (an element that adjusts the boundary away from origin without any dependence on the input
value)
“x” = vector of input x values

Single Layer Network


A single layer neural network consists of a set of units organized in a layer. Each unit U n receives
a weighted input Ijwith weight Wjn. Figure shows a single layer neural network with j inputs and outputs.

Multilayer Network
A multilayer network has two or more layers of units, with the output from one layer serving as input
to the next. Generally in a multilayer network there are 3 layers present like, input layer, output layer and
hidden layer. The layer with no external output connections are referred to as hidden layers. A multilayer
neural network structure is given in figure.
Feed Forward neural network
In this network, the information moves in only one direction, forward from the input nodes, through
the hidden nodes and to the output nodes. There are no cycles or loops in the network. In other way we can
say the feed forward neural network is one that does not have any connections from output to input. All inputs
with variable weights are connected with every other node. A single layer feed forward network has one layer
of nodes, whereas a multilayer feed forward network has multiple layers of nodes. The structure of a feed
forward multilayer network is given in figure.

Back Propagation neural network


Multilayer neural networks use a most common technique from a variety of learning technique, called the
back propagation algorithm. In back propagation neural network, the output values are compared with the
correct answer to compute the value of some predefined error function. By various techniques the error is
then fed back through the network. Using this information, the algorithms adjust the weights of each
connection in order to reduce the value of the error function by some small amount. After repeating this
process for a sufficiently large number of training cycles the network will usually converge to some state
where the error of the calculation is small.
The goal of back propagation, as with most training algorithms, is to iteratively adjust the weights in
the network to produce the desired output by minimizing the output error. The algorithm’s goal is to solve
credit assignment problem. Back propagation is a gradient-descent approach in that it uses the minimization
of first-order derivatives to find an optimal solution. The standard back propagation algorithm is given below.

Step1: Build a network with the chosen number of input, hidden and output u n i t s .
Step2: Initialize all the weights to low random values.
Step3: Randomly, choose a single training pair.
Step4: Copy the input pattern to the input layer.
Step5: Cycle the network so that the activation from the inputs generates the activations in the hidden and
output layers.
Step6: Calculate the error derivative between the output activation and the final o u t p u t .
Step7: Apply the method of back propagation to the summed products of the weights and errors in the
output layer in order to calculate the error in the hidden units.
Step8: Update the weights attached the each unit according to the error in that unit, the output from the
unit below it and the learning parameters, until the error is sufficiently low
Boltzmann Machines
A Boltzmann machine is the variation on the idea of Hopfield network. The main problem with Hopfield
network is that they settle in to local minima. Having many local minima is good for building content-
addressable memories. But for constraint satisfaction tasks, we need to find the globally optimal state of the
network. Boltzmann machine is the combination of simulated annealing and Hopfield network.
Units in Boltzmann machine update their individual binary states by stochastic rather than deterministic rule.
The probability that any given unit will be active is given by p

where is the sum of the unit’s active input lines and T is the temperature of the network.
Reinforcement Learning
Reinforcement learning is close to human learning. Algorithm learns a policy of how to act in a given
environment. Every action has some impact in the environment, and the environment provides rewards that
guides the learning algorithm. Reinforcement learning deals with agents that must sense and act upon their
environment.
Network learning procedure
i. The network presented with a sample input from the training set.
ii. The network computes what it thinks should be the sample output
iii. The network is supplied with a real-valued judgment by the teacher
iv. The network adjusts its weights and the process repeats.
v. A positive value in step (iii) indicates good performance, while the negative value indicates
bad performance.

Unsupervised learning
Unsupervised learning, no labels are given to the learning algorithm, leaving it on its own to find
structure in its input. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a
means towards an end. Given a set of input data, the network is allowed to play with it to try to discover
regularities and relationships between the different parts of the input. One popular unsupervised learning
scheme is known as competitive learning.
i. Present an input vector
ii. Calculate the initial activation for each output unit
iii. Let the output units fight until only one is active
iv. Increase the weights on connections between the active input units. This makes more likely
that the output unit will be time the pattern is repeated.

You might also like