Ch5: Natural Language Processing
Widassie Gerezgiher
Department of Computer Science and Engineering
Department of Information Technology
12/30/2019 1
Artificial Intelligence
Contents
• Overview
• NLP Applications
• NLP Components
• NLP Ambiguity
• NLP Python Libraries for Data Scientist
• Text Processing using Python
12/30/2019 Artificial Intelligence 2
Overview
• Natural language processing?
• Natural language
• Processing Tigrigna language
English language
Python language
12/30/2019 Artificial Intelligence 3
Overview
• Natural language
• Languages that naturally evolve and humans use to
communicate
Tigrigna language
• Processing
• How a computer carries out instructions English language
• Natural language processing ➔ how a computer
Python language
processes these natural languages
• How to deal with text data
12/30/2019 Artificial Intelligence 4
Overview
• AI: Is computer performing task that a human can do
• Humans can see image we can make machines to see image (Image processing)
• Human can interpret languages
• How can we make machines interpret languages?
• Example:
• Assume rgbe is a manager for customer service on company that provides door
security system with finger print. Customers call to express their satisfaction on the
products.
12/30/2019 Artificial Intelligence 5
Overview
Example:
• Assume rgbe is a manager for customer service on company that provides door
security system using finger print. Customers call to express their satisfaction on the
products. (1000xs calls)
• 1 : I like your product
• 2 : its good product but could you enhance the product to mobile based
• 3 : I am not interested in this product I was expecting with additional security
camera and to open and close my home as I came in and out automatically.
• Rgbe will use a sentiment analysis to know her customer satisfaction.
• Sentiment analysis is application of NLP
12/30/2019 Artificial Intelligence 6
Overview
• Natural language processing
• It is computer aided text analysis of human language
• The goal is to enable machines to understand human language and extract meaning
from text
• It is a field of study which falls under the category of machine learning and more
specifically computational linguistics
• Data science
• is using data to make decisions
12/30/2019 Artificial Intelligence 7
Overview
• Data science Venn diagram:
Scientist
12/30/2019 Artificial Intelligence 8
NLP application
• Sentiment analysis
• Speech recognition
• Chatbot: question and answer for customer service
• Spell checker
• Machine translation
• Advertisement matching: display ads based on your browse history
• Information extraction
12/30/2019 Artificial Intelligence 9
NLP components
• Natural language Understanding (NLU):
• Mapping i/p to meaningful representation
• Analyzing different aspects of the language
• E.g. Text data(sentences, paragraphs) ➔ NLU➔ system(processed data)
• Natural Language Generation (NLG):
• The process of producing meaningful phrase and sentences in the form of natural
language
• E.g. system(processed data) ➔ NLG➔ text data(sentences, paragraphs)
12/30/2019 Artificial Intelligence 10
NLP Ambiguity
• Difficulties for machine while understanding any particular language:
1. Lexical Ambiguity
2. Syntactic Ambiguity
3. Referential Ambiguity
• Lexical Ambiguity (semantic ambiguity)
• The presence of 2 or more possible meanings with in a single word
• E.g. 1, He is looking for a match. 2, The fisherman went to the bank.
12/30/2019 Artificial Intelligence 11
Cont’d
• Syntactic Ambiguity (structural or grammatical ambiguity)
• The presence of 2 or more possible meanings with in a sentence or a sequence
of words.
E.g.
• The chicken is ready to eat.
• Visiting relatives can be boring.
• I see the man with the binoculars.
12/30/2019 Artificial Intelligence 12
Cont’d
• Referential ambiguity
• Arises when referring something to pronouns
• E.g
• The boy told his father the theft. He was very upset
12/30/2019 Artificial Intelligence 13
Top 5 Natural Language Processing Python Libraries for
Data Scientist
1. spaCy:
• Extremely optimized NLP library which is meant to be operated together with
deep learning frameworks such as TensorFlow or PyTorch
2. Gensim
• Gensim is a Python library for topic modeling, document indexing and
similarity retrieval with large corpora
12/30/2019 Artificial Intelligence 14
Top 5 Natural Language Processing Python Libraries for
Data Scientist
3. Pattern
• It is a data mining library for python which is used to crawl and parse a variety
of sources such as Google, Twitter, Wikipedia, and many more
4. Natural Language Tool KIT [NLTK]
• It is one of the greatest library available out there to train NLP models. this
library is very easy to use. It is a beginner-friendly library for NLP. It has a lot
of pre-trained models and corpora which helps us to analyze the things very
easily.
12/30/2019 Artificial Intelligence 15
Top 5 Natural Language Processing Python Libraries for
Data Scientist
5. TextBlob:
• It is based on both Pattern and NLTK which provides a great API call for all
the common NLP Operations
• It isn’t the fastest or most complete library, it offers everything that one needs
on a day-to-day basis in an extremely accessible and manageable way
12/30/2019 Artificial Intelligence 16
Terminologies in text processing
Text normalization is text processing
1. Tokenization: splitting to words, phrases or sentences
2. POS tagging (part of speech tagging):
• Processes a sequence of words, and attaches a part of speech tag to each word.
3. Lemmatization:
• Groups together different inflected forms of a word, lemma.
• Stemming: is a process of removing and replacing word suffixes to arrive at a common root form of the word.
4. Stop words:
• I, me, myself, our, ours, ourselves, you, yours, are, have, can, may, ...
12/30/2019 Artificial Intelligence 17
Terminologies in text processing
5. Dependency parsing:
• To figure out how all the words in our sentence relate to each other
6. Finding Noun phrases
12/30/2019 Artificial Intelligence 18
Terminologies in text processing
7. Named entity recognition
8. Coreference resolution
12/30/2019 Artificial Intelligence 19
Terminologies in text processing
• Note: it’s worth mentioning that these are the steps in a typical NLP
pipeline, but you will skip steps or re-order steps depending on what
you want to do and how your NLP library is implemented.
12/30/2019 Artificial Intelligence 20
Tokenization using python
• Tokenizing Words and Sentences with NLTK, example:
>>> from nltk.tokenize import sent_tokenize, word_tokenize
>>> EXAMPLE_TEXT = "Word tokenization is the process of splitting a large
sample of text into words. We can also tokenize the sentences in a paragraph like
we tokenized the words. We use the method word_tokenize and sent_tokenize
to achieve these."
>>> print(sent_tokenize(EXAMPLE_TEXT))
>>> print(word_tokenize(EXAMPLE_TEXT))
12/30/2019 Artificial Intelligence 21
Tokenization
Tokens of any number
of consecutive written
words known as
Ngram
Tokens of three Tokens of two
consecutive written consecutive written
words known as words known as
Trigram Bigram
12/30/2019 Artificial Intelligence 22
BiGram tokenization using python
Output
import nltk [('Hi', 'How'), ('How', 'are'), ('are', 'you'), ('you', '?'),
('?', 'i'), ('i', 'am'), ('am', 'fine'), ('fine', 'and'), ('and',
'you')]
from nltk.util import ngrams
text = "Hi How are you? i am fine and you"
token = nltk.word_tokenize(text)
bigram = list(ngrams(token, 2))
print(bigram)
12/30/2019 Artificial Intelligence 23
TriGram tokenization using python
Output
import nltk [('Hi', 'How', 'are'), ('How', 'are', 'you'), ('are', 'you',
'?'), ('you', '?', 'i'), ('?', 'i', 'am'), ('i', 'am', 'fine'),
('am', 'fine', 'and'), ('fine', 'and', 'you')]
from nltk.util import ngrams
text = "Hi How are you? i am fine and you"
token = nltk.word_tokenize(text)
bigram = list(ngrams(token, 3))
print(bigram)
12/30/2019 Artificial Intelligence 24
Lemmatization
• Takes into consideration the morphological analysis of the word
• Groups together the different inflected forms of a word, called lemma
• Somehow similar to stemming, as it maps several words to one
common root
• The o/p of lemmatization is a proper word
12/30/2019 Artificial Intelligence 25
Lemmatize using python
>>> import nltk
>>> from nltk.stem import WordNetLemmatizer
>>> wordnet_lemmatizer = WordNetLemmatizer()
>>> word_data = "Lemmatization is similar to stemming but it brings context
to the words. So it goes a steps further by linking words with similar meaning
to one word. For example if a paragraph has words like cars, trains and
automobile, then it will link all of them to automobile. In the below program
we use the WordNet lexical database for lemmatization."
12/30/2019 Artificial Intelligence 26
Lemmatize using python
>>> nltk_tokens = nltk.word_tokenize(word_data)
>>> for w in nltk_tokens:
>>> print ("Actual: %s Lemma: %s" %
(w,wordnet_lemmatizer.lemmatize(w)))
12/30/2019 Artificial Intelligence 27
Stemming using python
>>> import nltk
>>> from nltk.stem.porter import PorterStemmer
>>> porter_stemmer = PorterStemmer()
>>> word_data = "In the areas of Natural Language Processing we come across situation where two
or more words have a common root. For example, the three words - agreed, agreeing and agreeable
have the same root word agree. A search involving any of these words should treat them as the same
word which is the root word. So it becomes essential to link all the words into their root word. The
NLTK library has methods to do this linking and give the output showing the root word. This
program uses the Porter Stemming Algorithm for stemming."
12/30/2019 Artificial Intelligence 28
Stemming using python
# First Word tokenization
>>> nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
>>> for w in nltk_tokens:
>>> print ("Actual: %s Stem: %s" %
(w,porter_stemmer.stem(w)))
12/30/2019 Artificial Intelligence 29
Stop Words using python
• Example: list out stop words in english
>>>from nltk.corpus import stopwords
>>>stop_word=set(stopwords.words('english’))
>>>print(stop_word)
12/30/2019 Artificial Intelligence 30
POS (part of speech tagging)
• Tagging: is a kind of classification
• Parts of speech include nouns, verbs, adverbs, adjectives, pronouns,
conjunction and their sub-categories.
• Pos tagging: implies labelling words with their appropriate Part-Of-
Speech.
12/30/2019 Artificial Intelligence 31
POS: Tags and Description
12/30/2019 Artificial Intelligence 32
POS Tagging using python
• Example1:
>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely")
>>> print(nltk.pos_tag(text))
• Construct a list of tagged tokens directly from a string
• 1st , tokenize the string to access the individual word/tag strings
• 2nd, convert each of these into a tuple (using str2tuple())
12/30/2019 Artificial Intelligence 33
Construct tagged tokens using python
• Exampe2:
• By convention in NLTK, a tagged token is
>>> tagged_token = nltk.tag.str2tuple('fly/NN')
represented using a tuple consisting of the
>>> tagged_token
token and the tag.
('fly', 'NN')
• Create one of these special tuples from the
>>> tagged_token[0]
standard string representation of a tagged
'fly'
token, using the function str2tuple():
>>> tagged_token[1]
'NN'
12/30/2019 Artificial Intelligence 34
Construct tagged tokens using python
Example3:
>>>import nltk
>>>sent = ' The/AT grand/JJ jury/NN commented/VBD on/IN a/AT number/NN of/IN other/AP topics/NNS ,/,
AMONG/IN them/PPO the/AT Atlanta/NP and/CC Fulton/NP-tl County/NN-tl purchasing/VBG
departments/NNS which/WDT it/PPS said/VBD ``/`` ARE/BER well/QL operated/VBN and/CC follow/VB
generally/RB accepted/VBN practices/NNS which/WDT inure/VB to/IN the/AT best/JJT interest/NN of/IN
both/ABX governments/NNS '
>>>nltk.tag.str2tuple(sent)
>>>for t in sent.split():
>>>print(t)
12/30/2019 Artificial Intelligence 35
Information Extraction
https://www.coursera.org/lecture/python-text-mining/information-extraction-5234x
• Information is hidden in free-text
• Most traditional transactional information is structured
• Abundance of unstructured data, freeform text
• How to convert unstructured text to structured form?
• Goal: identify and extract fields of interest from free text
12/30/2019 Artificial Intelligence 36
Field of interest
• Named entity
• [NEWS]: people, places, dates, …
• [FINANCE]: money, company, …
• [MEDICINE]: diseases, drugs, procedures, ….
• [PROTECTED HEALTH INFORMATION]: address, emails, professions, …
• Relations
• What happened to who, where, when, ….
12/30/2019 Artificial Intelligence 37
Named Entity Recognition
12/30/2019 Artificial Intelligence 38
Named Entity Recognition
12/30/2019 Artificial Intelligence 39
Named Entity Recognition
12/30/2019 Artificial Intelligence 40
Con’t
• Approaches to NER:
• Regular expression
• Machine learning
• Person, Organization, Location/GPE
12/30/2019 Artificial Intelligence 41
Relation Extraction
12/30/2019 Artificial Intelligence 42
Co-reference Resolution
12/30/2019 Artificial Intelligence 43
Question and Answering
12/30/2019 Artificial Intelligence 44
Nlp project help
• To open a Tigrigna text file in Python:
1. Fileame should be in utf-8 format
2. f = codecs.open(filename, mode(read or write), encoding="utf-8")
Example:
import codecs
f = codecs.open('tigrigna.txt', 'r', "utf-8")
u=f.read()
print(u)
12/30/2019 Artificial Intelligence 45
Spoken dialog system (Conversational
systems)
• A spoken dialog system is a computer system able to converse with a human with voice.
• It has two essential components that do not exist in a written text dialog system: a speech
recognizer and a text-to-speech module
• Components
• Automatic speech recognizer: decodes speech to text
• Language understanding:
• Speech synthesis: converts text to speech
• Dialogue manager:
• Language generation:
12/30/2019 Artificial Intelligence 46
Reference
• https://www.analyticsvidhya.com/blog/2017/10/essential-nlp-guide-
data-scientists-top-10-nlp-tasks/
12/30/2019 Artificial Intelligence 47
Ch6: Robotics
12/30/2019 Artificial Intelligence 48
Robotics
• The study of robots (robotics), is an engineering discipline
• A robot is a physical agent which is capable of executing motion for the
achievement of tasks
• A robot is a reprogrammable multifunctional manipulator designed to move
material, parts, tools, or specialized devices through variable programmed
motions for the performance of a variety of tasks
• A robot's degree of autonomy depends on its ability to perform the ordered
sequence of perception, decision-making and action
12/30/2019 Artificial Intelligence 49
The Robot Control Loop
Task planning
Speech, Vision
Plan classification
Acceleration, Temperature
Learn
Position, Distance, Touch, Force
Sense Think Process data
Magnetic field, Light
Path planning
Sound, position sense
Motion planning
Act
Output information Move, Speech,
Text, Visual Wheels Legs
Arms Tracks
12/30/2019 Artificial Intelligence 50
Why robot?
• Robot make a task easy
• They can do repetitive tasks with out getting bored
• They never get sick
• They never complain
12/30/2019 Artificial Intelligence 51
Purpose of Robot
• Dirty tasks
• Repetitive tasks
• Dangerous tasks
• Impossible tasks
• Robots assist the handicap
• Can operate equipment at much higher precision than humans
• Cheaper on a long term basis
12/30/2019 Artificial Intelligence 52
Robotic Applications
• Exploration
• Space missions
• Exploring volcanos
• Underwater exploration
• Medicine science
• Surgical assistance
• Assembly: factories parts
• Painting
• Security (bomb disposal)
• Home help (grass cutting, nursing)
12/30/2019 Artificial Intelligence 53
Basic elements of a robot
• Manipulator or Rover: main body of the robot (Links, Joints, other
structural elements of the robot)
• End Effector: the part that is connected to the last joint hand of a
manipulator
• In robotics, it’s the device at the end of robotic arm, designed to interact with the
environment
• End effectors may consists of a gripper or tool. A gripper can be of two fingers, three
fingers, or even five fingers
12/30/2019 Artificial Intelligence 54
Basic elements of a robot
• Actuators: muscles of the manipulators (stepper motor, pneumatic,
hydraulic cylinder). It actually help to give motion to all the joints or
component of robots
• Locomotion and Manipulation
• Locomotion: Legs, Wheels, other exotic means
• Manipulation: degree of freedom, arms, grippers
12/30/2019 Artificial Intelligence 55
Basic elements of a robot
• Sensors: to collect information about the internal state of the robot or to
communicate with the outside environment
• Controller: similar to cerebellum. It controls and co-ordinates the motion of
the actuators.
• Processor: the brain of the robot. It calculates the motions and the velocity
of the robot’s joints, etc.
• Software: operating system, robotic software and the collection of routines
12/30/2019 Artificial Intelligence 56
Building Blocks of Robot System
Robot
System Control System
Mechanical System
Signal Processing
System
Power Supply System
Sensors
12/30/2019 Artificial Intelligence 57
Mechanical System
• The most basic important part of a robot
• This system decides the locomotion of the robot
• By this a robot move in any direction, but
• A device is needed to convert electrical energy to mechanical energy called
actuators.
• The most popular actuator is DC motor
12/30/2019 Artificial Intelligence 58
Power Supply System, sensors
• For a robot to work properly needs power supply which acts as fuel to the robot
• DC power is supplied which is provided as battery
• For a robot to work on its own we implant sensors
• It senses: temperature, radio waves, heat, pressure
• The data from sensors should be processed
• Electrical and digital signals need to be processed so that the robot analyzes the situation and
makes its move
• For this we introduce electronic components to process the signal
12/30/2019 Artificial Intelligence 59
Signal Processing Systems
Signal Processing
Systems
ADC ADC
Analog Analog
Signal Signal
12/30/2019 Artificial Intelligence 60
Control System
• Is the major governing system of a robot
• Every system presented inside a robot and function can be represented
in the form of control system
• Based on control, robots are classified as manual, semi autonomous
and autonomous
12/30/2019 Artificial Intelligence 61
Control System
Robots
Semi
Manual Autonomous
Autonomous
Wired Wireless Pre
Self learning
programmed
12/30/2019 Artificial Intelligence 62
Design and Built Robot
• Mathematical models
• Kinematics: the study of motion without consideration of force and torque
• The science of geometry in motion. It is restricted to a pure geometrical description of
motion by means of position, orientation, and their time derivatives.
• Dynamics: the study of motion in relation to force and torque ( thing is moving
it generates dynamics)
• The science of motion. It describes why and how a motion occurs when forces and
moments are applied on massive bodies
12/30/2019 Artificial Intelligence 63
Design and Built Robot
• Control: relates the dynamics and kinematics of a robot to a
prescribed motion
• Motion planning
• Motion control
• Force control
12/30/2019 Artificial Intelligence 64
Con’t
• To design & built the robot we need to understand
• Mathematical model
• This mathematical model is used to create controllers
• Controllers help us to control motions,
• Its important to plan motions that are safe and generate trajectories that are smooth
• The robot interacts with the environment to move so we need to relay on force control
12/30/2019 Artificial Intelligence 65
Example Accomplished by
• Controlling the joint position &
• Creating trajectories
12/30/2019 Artificial Intelligence 66
Con’t
12/30/2019 Artificial Intelligence 67
Con’t
• Final exam • 85%
• Ch-4, 5, 6
• Choose→5
• 15%
• Match→ 5 • Ch-1, 2, 3
• True/False→5
• Work out →5
12/30/2019 Artificial Intelligence 68
The End ☺
12/30/2019 Artificial Intelligence 69