[go: up one dir, main page]

0% found this document useful (0 votes)
4 views11 pages

With The Telescope Saw With The Telescope The Man

Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand and generate human language. Key issues in NLP include ambiguity, data sparsity, and handling idioms, while important tasks involve morphological processing, syntax analysis, semantic analysis, discourse integration, and pragmatic analysis. Various Python examples illustrate NLP techniques such as stemming, lemmatization, parsing, and question answering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views11 pages

With The Telescope Saw With The Telescope The Man

Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand and generate human language. Key issues in NLP include ambiguity, data sparsity, and handling idioms, while important tasks involve morphological processing, syntax analysis, semantic analysis, discourse integration, and pragmatic analysis. Various Python examples illustrate NLP techniques such as stemming, lemmatization, parsing, and question answering.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

What is NLP?

Natural Language Processing (NLP) is a field of artificial intelligence (AI) that focuses on enabling
computers to understand, interpret, and generate human language — both written and spoken.

It bridges the gap between human communication and computer understanding, allowing
machines to work with natural language data.

Key Issues in NLP

1. Ambiguity

o Lexical Ambiguity: Words have multiple meanings (e.g., “bank” = riverbank or


financial institution).

o Syntactic Ambiguity: Sentence structure can be interpreted in more than one way.

Example:

Sentence:
"I saw the man with the telescope."

Possible interpretations:

I used the telescope to see the man.


(The phrase with the telescope modifies saw.)

The man I saw had a telescope.


(The phrase with the telescope modifies the man.)

o Semantic Ambiguity: Meaning is unclear without context.

Example:

Sentence:
"I went to the bank."

Possible meanings:

Bank as a financial institution:


You went to a place to deposit or withdraw money.

Bank as the side of a river:


You went to the edge of a river or lake.

2 Data Sparsity

Many languages or specific domains don’t have enough quality annotated data to
train models effectively.

3 Out-of-Vocabulary (OOV) Words

New words, typos, slang, or rare terms that models haven’t seen before cause
comprehension issues.

4 Handling Idioms and Figurative Language


Phrases like "kick the bucket" (meaning “to die”) are hard for systems to interpret
literally.

What is Morphological Processing?

Morphological processing is the step in NLP that deals with the structure of words — how words are
formed from smaller meaningful units called morphemes (like roots, prefixes, suffixes).

It includes tasks like:

 Stemming: Reducing words to their base or root form (not necessarily a real word).

 Lemmatization: Reducing words to their dictionary (lemma) form using vocabulary and
context.

 Morphological Analysis: Breaking down words into morphemes (root + affixes).

Example: Morphological Processing for the word “running”

Process Output Explanation

Input Word running The word to analyze

Stemming run Remove suffix “-ing” to get stem “run”

Lemmatization run Lemma is “run”, the dictionary form

Morpheme Split run + ing Root = "run", suffix = "-ing" (present participle)

Python Example with NLTK

import nltk

from nltk.stem import PorterStemmer

from nltk.stem import WordNetLemmatizer

# Download required resources (run once)

nltk.download('wordnet')

nltk.download('omw-1.4')

word = "running"

# Stemming

ps = PorterStemmer()

stem = ps.stem(word)

# Lemmatization

lemmatizer = WordNetLemmatizer()

lemma = lemmatizer.lemmatize(word, pos='v') # pos='v' for verb


print("Original word:", word)

print("Stemmed word:", stem)

print("Lemmatized word:", lemma)

Original word: running

Stemmed word: run

Lemmatized word: run

What is Syntax Analysis?

Syntax Analysis (also called Parsing) is the process in NLP that examines the grammatical structure of
a sentence. It identifies how words relate to each other to form phrases, clauses, and overall
sentence meaning according to the rules of a language.

The goal is to build a parse tree or syntax tree that shows the syntactic structure.

Why is Syntax Analysis important?

 Helps understand the grammatical relationships between words.

 Crucial for tasks like machine translation, question answering, and information extraction.

 Differentiates sentences with similar words but different meanings based on structure.

Types of Syntax Analysis

1. Constituency Parsing
Breaks sentence into nested constituents or phrases (noun phrase, verb phrase, etc.)

2. Dependency Parsing
Represents grammatical relations as links between words (e.g., subject, object).

Example Sentence

"The cat sat on the mat."

Constituency Parse Tree (simplified):

(S

(NP The cat)

(VP sat

(PP on

(NP the mat))))

 S = Sentence
 NP = Noun Phrase

 VP = Verb Phrase

 PP = Prepositional Phrase

Example of Dependency Parsing:

 sat is the root verb

 cat is the subject of sat

 on is a preposition linked to sat

 mat is the object of the preposition on

Python Example using spaCy (Dependency Parsing)

import spacy

# Load English model

nlp = spacy.load("en_core_web_sm")

sentence = "The cat sat on the mat."

doc = nlp(sentence)

# Print dependencies

for token in doc:

print(f"{token.text:10} --> {token.dep_:10} --> {token.head.text}")

Output:

rust

CopyEdit

The --> det --> cat

cat --> nsubj --> sat

sat --> ROOT --> sat

on --> prep --> sat

the --> det --> mat

mat --> pobj --> on

. --> punct --> sat

What is Semantic Analysis?


Semantic Analysis is the process of understanding the meaning of text. It goes beyond the structure
(syntax) to capture what the text actually means — the concepts, relationships, and the intended
message.

Why is Semantic Analysis Important?

 To understand the context and meaning of sentences.

 Enables applications like question answering, chatbots, machine translation, and


information retrieval.

 Helps resolve ambiguities, e.g., word sense disambiguation.

Key Tasks in Semantic Analysis

1. Word Sense Disambiguation


Determine which sense of a word is used in context (e.g., “bank” as riverbank vs. financial
bank).

2. Named Entity Recognition (NER)


Identify entities like people, places, organizations.

3. Semantic Role Labeling


Identify predicate-argument structures — who did what to whom.

4. Coreference Resolution
Find which words refer to the same entity (e.g., “John ... he”).

5. Sentiment Analysis
Determine the sentiment or emotion expressed.

Example: Sentence Meaning

Sentence: “John gave Mary a book.”

 Semantic analysis identifies:

o John = giver (agent)

o Mary = receiver (recipient)

o book = object (theme)

 The relation: John → gave → Mary (with object book)

Simple Semantic Analysis in Python (Using SpaCy for NER and Dependency)

import spacy

nlp = spacy.load("en_core_web_sm")
text = "Apple is looking at buying a startup in New York."

doc = nlp(text)

print("Named Entities:")

for ent in doc.ents:

print(ent.text, ent.label_)

print("\nSemantic Roles (Subject - Verb - Object):")

for token in doc:

if token.dep_ == "ROOT":

subject = [child for child in token.children if child.dep_ == "nsubj"]

dobj = [child for child in token.children if child.dep_ == "dobj"]

if subject and dobj:

print(f"{subject[0].text} - {token.text} - {dobj[0].text}")

Output:

Named Entities:

Apple ORG

New York GPE

Semantic Roles (Subject - Verb - Object):

Apple - looking - startup

What is Discourse Integration?

Discourse Integration is the process of understanding how individual sentences or utterances


connect to form a coherent whole in a text or conversation. It goes beyond analyzing single
sentences to interpreting the relationships between sentences, paragraphs, or turns in dialogue.

It helps NLP systems understand context across multiple sentences, maintain topic continuity, and
grasp implied meaning.

Why is Discourse Integration Important?

 Maintains coherence in text understanding.

 Resolves pronouns and references across sentences (anaphora resolution).

 Detects relations like cause-effect, contrast, elaboration between sentences.

 Essential for text summarization, dialogue systems, story understanding, and machine
translation.
Key Tasks in Discourse Integration

1. Anaphora Resolution
Identifying what pronouns (he, she, it, they) refer to across sentences.

2. Coherence Relations
Understanding logical relations between sentences, e.g., contrast, cause, elaboration.

3. Discourse Parsing
Structuring text into discourse units linked by relations.

Example

Text:
“John went to the park. He saw a dog.”

 Discourse integration links “He” in the second sentence to “John” in the first.

 Understanding that these two sentences form a coherent idea about John’s activities.

Simple Python Example: Anaphora Resolution with neuralcoref (extension to spaCy)

pip install spacy neuralcoref

import spacy

import neuralcoref

nlp = spacy.load('en_core_web_sm')

# Add neuralcoref to spaCy's pipeline

neuralcoref.add_to_pipe(nlp)

text = "John went to the park. He saw a dog."

doc = nlp(text)

print("Original Text:")

print(text)

print("\nAfter Coreference Resolution:")

print(doc._.coref_resolved)

Output:

Original Text:

John went to the park. He saw a dog.

After Coreference Resolution:

John went to the park. John saw a dog.


What is Pragmatic Analysis?

Pragmatic Analysis in NLP is the process of understanding the intended meaning of language in
context — not just what the words say literally, but what the speaker/writer actually means based
on the situation, shared knowledge, and social cues.

It deals with things like:

 Implicature (implied meaning beyond literal words)

 Speech acts (e.g., requests, promises, questions)

 Contextual factors (who is speaking, to whom, when, where)

 Deixis (words like “this,” “that,” “here,” “now” whose meaning depends on context)

Why is Pragmatic Analysis Important?

 Understands indirect meaning (e.g., sarcasm, irony, politeness)

 Helps dialogue systems respond appropriately

 Crucial for natural conversations, sentiment understanding, humor detection

Example

Sentence:
“Can you pass the salt?”

 Literal meaning: Asking if someone is capable of passing the salt.

 Pragmatic meaning: Polite request for the salt.

Simple Example in NLP

Pragmatic understanding often requires context or external knowledge beyond sentence alone.
Here’s a simple example using contextual dialogue:

dialogue = [

"Person A: It's cold in here.",

"Person B: I'll close the window."

# Literal meaning: Person A states a fact.

# Pragmatic meaning: Person A is indirectly requesting to close the window.

print("Dialogue:")

for line in dialogue:


print(line)

print("\nPragmatic interpretation:")

print("Person A's statement implies a request to close the window.")

Challenges of Pragmatic Analysis

 Requires world knowledge and context tracking.

 Hard to automate fully — often involves common sense reasoning.

 Complex in multi-turn conversations.

import spacy

from transformers import pipeline

# Load spaCy model

nlp = spacy.load("en_core_web_sm")

# Sample solar system text (knowledge base)

solar_system_text = """

Jupiter is the largest planet in the solar system.

It has a strong magnetic field and at least 79 moons.

Saturn is known for its ring system.

Mars is called the Red Planet due to iron oxide on its surface.

Venus has a thick atmosphere rich in carbon dioxide.

earth has only one moon.

And earth moon name is chandrama.

"""

# Process with spaCy

doc = nlp(solar_system_text)
# -------------------------------

# 1. Tokenization and POS tagging

# -------------------------------

print("=== Token Info ===")

for token in doc[:10]: # show first 10 tokens

print(f"{token.text:<12} POS: {token.pos_:<6} | Lemma: {token.lemma_}")

# -------------------------------

# 2. Named Entity Recognition

# -------------------------------

print("\n=== Named Entities ===")

for ent in doc.ents:

print(f"{ent.text:<20} --> {ent.label_}")

# -------------------------------

# 3. Dependency Parsing

# -------------------------------

print("\n=== Dependency Parsing ===")

for sent in doc.sents:

print(f"\nSentence: {sent.text.strip()}")

for token in sent:

print(f"{token.text:<12} Head: {token.head.text:<12} Dep: {token.dep_}")

# -------------------------------

# 4. Extract simple facts

# -------------------------------

print("\n=== Extracted Facts ===")

for sent in doc.sents:

if "moon" in sent.text.lower() or "ring" in sent.text.lower():

print("Fact:", sent.text.strip())
# -------------------------------

# 5. Question Answering (QA)

# -------------------------------

print("\n=== Question Answering ===")

# Load QA model

qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")

# Sample questions

questions = [

"How many moons does Jupiter have?",

"Which planet has a ring system?",

"Why is Mars called the Red Planet?",

"What is Venus's atmosphere made of?",

"What is the largest planet?",

"number of moon jupiter have?",

"number of moon earth have?",

"what is the name of earth moon?"

# Ask questions

for question in questions:

result = qa_pipeline(question=question, context=solar_system_text)

print(f"Q: {question}")

print(f"A: {result['answer']}\n")

You might also like