AI UNIT-3 Knowledge Representation
AI UNIT-3 Knowledge Representation
Humans are best at understanding, reasoning, and interpreting knowledge. Human knows things, which is
knowledge and as per their knowledge they perform various actions in the real world. But how machines do
all these things comes under knowledge representation and reasoning. Hence we can describe Knowledge
representation as following:
o Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence which concerned
with AI agents thinking and how thinking contributes to intelligent behavior of agents.
o It is responsible for representing information about the real world so that a computer can understand
and can utilize this knowledge to solve the complex real world problems such as diagnosis a medical
condition or communicating with humans in natural language.
o It is also a way which describes how we can represent knowledge in artificial intelligence. Knowledge
representation is not just storing data into some database, but it also enables an intelligent machine to
learn from that knowledge and experiences so that it can behave intelligently like a human.
What to Represent:
Following are the kind of knowledge which needs to be represented in AI systems:
o Object: All the facts about objects in our world domain. E.g., Guitars contains strings, trumpets are
brass instruments.
o Events: Events are the actions which occur in our world.
o Performance: It describe behavior which involves knowledge about how to do things.
o Meta-knowledge: It is knowledge about what we know.
o Facts: Facts are the truths about the real world and what we represent.
o Knowledge-Base: The central component of the knowledge-based agents is the knowledge base. It is
represented as KB. The Knowledgebase is a group of the Sentences (Here, sentences are used as a
technical term and not identical with the English language).
Knowledge: Knowledge is awareness or familiarity gained by experiences of facts, data, and situations.
Following are the types of knowledge in artificial intelligence:
Types of knowledge
Following are the various types of knowledge
1. Declarative Knowledge:
2. Procedural Knowledge
3. Meta-knowledge:
4. Heuristic knowledge:
5. Structural knowledge:
Let's suppose if you met some person who is speaking in a language which you don't know, then how you will
able to act on that. The same thing applies to the intelligent behavior of the agents.
As we can see in below diagram, there is one decision maker which act by sensing the environment and using
knowledge. But if the knowledge part will not present then, it cannot display intelligent behavior.
AI knowledge cycle:
An Artificial intelligence system has the following components for displaying intelligent behavior:
o Perception
o Learning
o Knowledge Representation and Reasoning
o Planning
o Execution
The above diagram is showing how an AI system can interact with the real world and what components help it
to show intelligence. AI system has Perception component by which it retrieves information from its
environment. It can be visual, audio or another form of sensory input. The learning component is responsible
for learning from data captured by Perception comportment. In the complete cycle, the main components are
knowledge representation and Reasoning. These two components are involved in showing the intelligence in
machine-like humans. These two components are independent with each other but also coupled together. The
planning and execution depend on analysis of Knowledge representation and reasoning.
o It is the simplest way of storing facts which uses the relational method, and each fact about a set of the
object is set out systematically in columns.
o This approach of knowledge representation is famous in database systems where the relationship
between different entities is represented.
o This approach has little opportunity for inference.
Player1 65 23
Player2 58 18
Player3 75 24
2. Inheritable knowledge:
o In the inheritable knowledge approach, all data must be stored into a hierarchy of classes.
o All classes should be arranged in a generalized form or a hierarchal manner.
o In this approach, we apply inheritance property.
o Elements inherit values from other members of a class.
o This approach contains inheritable knowledge which shows a relation between instance and class, and
it is called instance relation.
o Every individual frame can represent the collection of attributes and its value.
o In this approach, objects and values are represented in Boxed nodes.
o We use Arrows which point from objects to their values.
o Example:
3. Inferential knowledge:
4. Procedural knowledge:
o Procedural knowledge approach uses small programs and codes which describes how to do specific
things, and how to proceed.
o In this approach, one important rule is used which is If-Then rule.
o In this knowledge, we can use various coding languages such as LISP language and Prolog language.
o We can easily represent heuristic or domain-specific knowledge using this approach.
o But it is not necessary that we can represent all cases in this approach.
1. 1. Representational Accuracy:
KR system should have the ability to represent all kind of required knowledge.
2. 2. Inferential Adequacy:
KR system should have ability to manipulate the representational structures to produce new
knowledge corresponding to existing structure.
3. 3. Inferential Efficiency:
The ability to direct the inferential knowledge mechanism into the most productive directions by
storing appropriate guides.
4. 4. Acquisitional efficiency- The ability to acquire the new knowledge easily using automatic methods.
To represent the above statements, PL logic is not sufficient, so we required some more powerful logic, such as
first-order logic.
First-Order logic:
o First-order logic is another way of knowledge representation in artificial intelligence. It is an extension
to propositional logic.
o FOL is sufficiently expressive to represent the natural language statements in a concise way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-order logic is a
powerful language that develops information about the objects in a more easy way and can also
express the relationship between those objects.
o First-order logic (like natural language) does not only assume that the world contains facts like
propositional logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation such
as: the sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax
b. Semantics
Variables x, y, z, a, b,....
Connectives ∧, ∨, ¬, ⇒, ⇔
Equality ==
Quantifier ∀, ∃
Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences are formed from a
predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of the
statement and second part "is an integer," is known as a predicate.
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement within its range is
true for everything or every instance of a particular thing.
o For all x
o For each x
o For every x.
Example:
All man drink coffee.
Let a variable x which refers to a cat so all x can be represented in UOD as below:
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its scope is true for at
least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a predicate variable
then it is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
Example:
Some boys are intelligent.
It will be read as: There are some x where x is a boy who is intelligent.
Points to remember:
o The main connective for universal quantifier ∀ is implication →.
o The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers:
o In universal quantifier, ∀x∀y is similar to ∀y∀x.
o In Existential quantifier, ∃x∃y is similar to ∃y∃x.
o ∃x∀y is not similar to ∀y∃x.
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope of the quantifier.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the scope of the
quantifier.
1. Logical Representation
2. Semantic Network Representation
3. Frame Representation
4. Production Rules
1. Logical Representation
Logical representation is a language with some concrete rules which deals with propositions and has no
ambiguity in representation. Logical representation means drawing a conclusion based on various conditions.
This representation lays down some important communication rules. It consists of precisely defined syntax and
semantics which supports the sound inference. Each sentence can be translated into logics using syntax and
semantics.
Syntax:
o Syntaxes are the rules which decide how we can construct legal sentences in the logic.
o It determines which symbol we can use in knowledge representation.
o How to write those symbols.
Semantics:
o Semantics are the rules by which we can interpret the sentence in the logic.
o Semantic also involves assigning a meaning to each sentence.
a. Propositional Logics
b. Predicate logics
1. Logical representations have some restrictions and are challenging to work with.
2. Logical representation technique may not be very natural, and inference may not be so efficient.
Note: Do not be confused with logical representation and logical reasoning as logical representation is a
representation language and reasoning is a process of thinking logically.
Example: Following are some statements which we need to represent in the form of nodes and arcs.
Statements:
a. Jerry is a cat.
b. Jerry is a mammal
c. Jerry is owned by Priya.
d. Jerry is brown colored.
e. All Mammals are animal.
In the above diagram, we have represented the different type of knowledge in the form of nodes and arcs. Each
object is connected with another object by some relation.
1. Semantic networks take more computational time at runtime as we need to traverse the complete
network tree to answer some questions. It might be possible in the worst case scenario that after
traversing the entire tree, we find that the solution does not exist in this network.
2. Semantic networks try to model human-like memory (Which has 1015 neurons and links) to store the
information, but in practice, it is not possible to build such a vast semantic network.
3. These types of representations are inadequate as they do not have any equivalent quantifier, e.g., for
all, for some, none, etc.
4. Semantic networks do not have any standard definition for the link names.
5. These networks are not intelligent and depend on the creator of the system.
3. Frame Representation
A frame is a record like structure which consists of a collection of attributes and its values to describe an entity
in the world. Frames are the AI data structure which divides knowledge into substructures by representing
stereotypes situations. It consists of a collection of slots and slot values. These slots may be of any type and
sizes. Slots have names and values which are called facets.
Facets: The various aspects of a slot is known as Facets. Facets are features of frames which enable us to put
constraints on the frames. Example: IF-NEEDED facts are called when data of any particular slot is needed. A
frame may consist of any number of slots, and a slot may include any number of facets and facets may have any
number of values. A frame is also known as slot-filter knowledge representation in artificial intelligence.
Frames are derived from semantic networks and later evolved into our modern-day classes and objects. A
single frame is not much useful. Frames system consist of a collection of frames which are connected. In the
frame, knowledge about an object or event can be stored together in the knowledge base. The frame is a type
of technology which is widely used in various applications including Natural language processing and machine
visions.
Example: 1
Year 1996
Page 1152
Example 2:
Let's suppose we are taking an entity, Peter. Peter is an engineer as a profession, and his age is 25, he lives in
city London, and the country is England. So following is the frame representation for this:
Slots Filter
Name Peter
Profession Doctor
Age 25
Weight 78
1. The frame knowledge representation makes the programming easier by grouping the related data.
2. The frame representation is comparably flexible and used by many applications in AI.
3. It is very easy to add slots for new attribute and relations.
4. It is easy to include default data and to search for missing values.
5. Frame representation is easy to understand and visualize.
In production rules agent checks for the condition and if the condition exists then production rule fires and
corresponding action is carried out. The condition part of the rule determines which rule may be applied to a
problem. And the action part carries out the associated problem-solving steps. This complete process is called a
recognize-act cycle.
The working memory contains the description of the current state of problems-solving and rule can write
knowledge to the working memory. This knowledge match and may fire other rules.
If there is a new situation (state) generates, then multiple production rules will be fired together, this is called
conflict set. In this situation, the agent needs to select a rule from these sets, and it is called a conflict
resolution.
Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus)
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (on bus AND unpaid) THEN action (pay charges).
o IF (bus arrives at destination) THEN action (get down from the bus).
1. Production rule system does not exhibit any learning capabilities, as it does not store the result of the
problem for the future uses.
2. During the execution of the program, many rules may be active hence rule-based production systems
are inefficient.
Propositional logic in Artificial intelligence
Propositional logic (PL) is the simplest form of logic where all the statements are made by propositions. A
proposition is a declarative statement which is either true or false. It is a technique of knowledge representation
in logical and mathematical form.
Example:
1. a) It is Sunday.
2. b) The Sun rises from West (False proposition)
3. c) 3+3= 7(False proposition)
4. d) 5 is a prime number.
The syntax of propositional logic defines the allowable sentences for the knowledge representation. There are
two types of Propositions:
a. Atomic Propositions
b. Compound propositions
o Atomic Proposition: Atomic propositions are the simple propositions. It consists of a single
proposition symbol. These are the sentences which must be either true or false.
Example:
1. a) 2+2 is 4, it is an atomic proposition as it is a true fact.
2. b) "The Sun is cold" is also a proposition as it is a false fact.
Example:
Logical Connectives:
Logical connectives are used to connect two simpler propositions or representing a sentence logically. We can
create compound propositions with the help of logical connectives. There are mainly five connectives, which are
given as follows:
1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive literal or
negative literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.
Example: Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction, where P and Q are
the propositions.
Example: "Ritika is a doctor or Engineer",
Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.
4. Implication: A sentence such as P → Q, is called an implication. Implications are also known as if-then
rules. It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I am breathing,
then I am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
We can build a proposition composing three propositions P, Q, and R. This truth table is made-up of 8n Tuples
as we have taken three proposition symbols.
Precedence of connectives:
Just like arithmetic operators, there is a precedence order for propositional connectors or logical operators. This
order should be followed while evaluating a propositional problem. Following is the list of the precedence order
for operators:
Precedence Operators
Note: For better understanding use parenthesis to make sure of the correct interpretations. Such as ¬R∨ Q, It can be
interpreted as (¬R) ∨ Q.
Logical equivalence:
Logical equivalence is one of the features of propositional logic. Two propositions are said to be logically
equivalent if and only if the columns in the truth table are identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In below truth table we
can see that column for ¬A∨ B and A→B, are identical hence A is Equivalent to B
Properties of Operators:
o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.
o We cannot represent relations like ALL, some, or none with propositional logic. Example:
a. All the girls are intelligent.
b. Some apples are sweet.
Propositional logic has limited expressive power.
In propositional logic, we cannot describe statements in terms of their properties or logical relationships.
Inference rules:
Inference rules are the templates for generating valid arguments. Inference rules are applied to derive proofs in
artificial intelligence, and the proof is a sequence of the conclusion that leads to the desired goal.
In inference rules, the implication among all the connectives plays an important role. Following are some
terminologies related to inference rules:
From the above term some of the compound statements are equivalent to each other, which we can prove
using truth table:
Hence from the above truth table, we can prove that P → Q is equivalent to ¬ Q → ¬ P, and Q→ P is equivalent
to ¬ P → ¬ Q.
1. Modus Ponens:
The Modus Ponens rule is one of the most important rules of inference, and it states that if P and P → Q is true,
then we can infer that Q will be true. It can be represented as:
Example:
2. Modus Tollens:
The Modus Tollens rule state that if P→ Q is true and ¬ Q is true, then ¬ P will also true. It can be represented
as:
3. Hypothetical Syllogism:
The Hypothetical Syllogism rule state that if P→R is true whenever P→Q is true, and Q→R is true. It can be
represented as the following notation:
Example:
Statement-1: If you have my home key then you can unlock my home. P→Q
Statement-2: If you can unlock my home then you can take my money. Q→R
Conclusion: If you have my home key then you can take my money. P→R
The Disjunctive syllogism rule state that if P∨Q is true, and ¬P is true, then Q will be true. It can be represented
as:
Example:
Proof by truth-table:
5. Addition:
The Addition rule is one the common inference rule, and it states that If P is true, then P∨Q will be true.
Example:
Statement: I have a vanilla ice-cream. ==> P
Statement-2: I have Chocolate ice-cream.
Conclusion: I have vanilla or chocolate ice-cream. ==> (P∨Q)
Proof by Truth-Table:
6. Simplification:
The simplification rule state that if P∧ Q is true, then Q or P will also be true. It can be represented as:
Proof by Truth-Table:
7. Resolution:
The Resolution rule state that if P∨Q and ¬ P∧R is true, then Q∨R will also be true. It can be represented as
Proof by Truth-Table:
What is Unification?
o Unification is a process of making two different logical atomic expressions identical by finding a
substitution. Unification depends on the substitution process.
o It takes two literals as input and makes them identical using substitution.
o Let Ψ1 and Ψ2 be two atomic sentences and 𝜎 be a unifier such that, Ψ1𝜎 = Ψ2𝜎, then it can be
expressed as UNIFY(Ψ1, Ψ2).
o Example: Find the MGU for Unify{King(x), King(John)}
Substitution θ = {John/x} is a unifier for these atoms and applying this substitution, and both expressions will
be identical.
o The UNIFY algorithm is used for unification, which takes two atomic sentences and returns a unifier for
those sentences (If any exist).
o Unification is a key component of all first-order inference algorithms.
o It returns fail if the expressions do not match with each other.
o The substitution variables are called Most General Unifier or MGU.
E.g. Let's say there are two different expressions, P(x, y), and P(a, f(z)).
In this example, we need to make both above statements identical to each other. For this, we will perform the
substitution.
o Predicate symbol must be same, atoms or expression with different predicate symbol can never be
unified.
o Number of Arguments in both expressions must be identical.
o Unification will fail if there are two similar variables present in the same expression.
Unification Algorithm:
Algorithm: Unify(Ψ1, Ψ2)
For each pair of the following atomic sentences find the most general unifier (If exist).
5. Find the MGU of Q(a, g(x, a), f(y)), Q(a, g(f(b), a), x)}
SUBST θ= {b/y}
S1 => {Q(a, g(f(b), a), f(b)); Q(a, g(f(b), a), f(b))}, Successfully Unified.
Resolution in FOL
Resolution
Resolution is a theorem proving technique that proceeds by building refutation proofs, i.e., proofs by
contradictions. It was invented by a Mathematician John Alan Robinson in the year 1965.
Resolution is used, if there are various statements are given, and we need to prove a conclusion of those
statements. Unification is a key concept in proofs by resolutions. Resolution is a single inference rule which can
efficiently operate on the conjunctive normal form or clausal form.
Clause: Disjunction of literals (an atomic sentence) is called a clause. It is also known as a unit clause.
This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
To better understand all the above steps, we will take an example in which we will apply resolution.
Example:
In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier for resolution
proofs.
Note: Statements "food(Apple) Λ food(vegetables)" and "eats (Anil, Peanuts) Λ alive(Anil)" can be written in two
separate statements.
In this statement, we will apply negation to the conclusion statements, which will be written as ¬likes(John,
Peanuts)
Now in this step, we will solve the problem by resolution tree using substitution. For the above problem, it will
be given as follows:
Hence the negation of the conclusion has been proved as a complete contradiction with the given set of
statements.
o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get resolved(canceled)
by substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)
o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved (canceled) by
substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V killed(y) .
o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get resolved by
substitution {Anil/y}, and we are left with Killed(Anil) .
o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by
substitution {Anil/k}, and we are left with ¬ alive(Anil) .
o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.
a. Forward chaining
b. Backward chaining
Horn clause and definite clause are the forms of sentences, which enables knowledge base to use a more
restricted and efficient inference algorithm. Logical inference algorithms use forward and backward chaining
approaches, which require KB in the form of the first-order definite clause.
66.7M
1.4K
Features of Java - Javatpoint
Definite clause: A clause which is a disjunction of literals with exactly one positive literal is known as a
definite clause or strict horn clause.
Horn clause: A clause which is a disjunction of literals with at most one positive literal is known as horn
clause. Hence all the definite clauses are horn clauses.
It is equivalent to p ∧ q → k.
A. Forward Chaining
Forward chaining is also known as a forward deduction or forward reasoning method when using an inference
engine. Forward chaining is a form of reasoning which start with atomic sentences in the knowledge base and
applies inference rules (Modus Ponens) in the forward direction to extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts, triggers all rules whose premises are satisfied, and add
their conclusion to the known facts. This process repeats until the problem is solved.
Properties of Forward-Chaining:
Example:
"As per the law, it is a crime for an American to sell weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to it by Robert, who is an American citizen."
To solve the above problem, first, we will convert all the above facts into first-order definite clauses, and then
we will use a forward-chaining algorithm to reach the goal.
o It is a crime for an American to sell weapons to hostile nations. (Let's say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) → Criminal(p) ...(1)
o Country A has some missiles. ?p Owns(A, p) ∧ Missile(p). It can be written in two definite clauses by
using Existential Instantiation, introducing new Constant T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
o All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A) ......(4)
o Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
o Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
o Country A is an enemy of America.
Enemy (A, America) .........(7)
o Robert is American
American(Robert). ..........(8)
In the first step we will start with the known facts and will choose the sentences which do not have implications,
such as: American(Robert), Enemy(A, America), Owns(A, T1), and Missile(T1). All these facts will be
represented as below.
Step-2:
At the second step, we will see those facts which infer from available facts and with satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in the first iteration.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert, T1, A) is added, which infers from the conjunction
of Rule (2) and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is added and which infers from Rule-(7).
Step-3:
At step-3, as we can check Rule-(1) is satisfied with the substitution {p/Robert, q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available facts. And hence we reached our goal statement.
B. Backward Chaining:
Backward-chaining is also known as a backward deduction or backward reasoning method when using an
inference engine. A backward chaining algorithm is a form of reasoning, which starts with the goal and works
backward, chaining through rules to find known facts that support the goal.
Example:
In backward-chaining, we will use the same above example, and will rewrite all the rules.
Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate, which is Criminal(Robert), and then infer further
rules.
Step-1:
At the first step, we will take the goal fact. And from the goal fact, we will infer other facts, and at last, we will
prove those facts true. So our goal fact is "Robert is Criminal," so following is the predicate of it.
Step-2:
At the second step, we will infer other facts form goal fact which satisfies the rules. So as we can see in Rule-1,
the goal predicate Criminal (Robert) is present with substitution {Robert/P}. So we will add all the conjunctive
facts below the first level and will replace p with Robert.
Step-3:t At step-3, we will extract further fact Missile(q) which infer from Weapon(q), as it satisfies Rule-(5).
Weapon (q) is also true with the substitution of a constant T1 at q.
Step-4:
At step-4, we can infer facts Missile(T1) and Owns(A, T1) form Sells(Robert, T1, r) which satisfies the Rule- 4,
with the substitution of A in place of r. So these two statements are proved here.
Step-5:
At step-5, we can infer the fact Enemy(A, America) from Hostile(A) which satisfies Rule- 6. And hence all the
statements are proved true using backward chaining.
Difference between backward chaining and forward chaining
Following is the difference between the forward chaining and backward chaining:
o Forward chaining as the name suggests, start from the known facts and move forward by applying
inference rules to extract more data, and it continues until it reaches to the goal, whereas backward
chaining starts from the goal, move backward by using inference rules to determine the facts that
satisfy the goal.
o Forward chaining is called a data-driven inference technique, whereas backward chaining is called
a goal-driven inference technique.
o Forward chaining is known as the down-up approach, whereas backward chaining is known as a top-
down approach.
o Forward chaining uses breadth-first search strategy, whereas backward chaining uses depth-first
search strategy.
o Forward and backward chaining both applies Modus ponens inference rule.
o Forward chaining can be used for tasks such as planning, design process monitoring, diagnosis, and
classification, whereas backward chaining can be used for classification and diagnosis tasks.
o Forward chaining can be like an exhaustive search, whereas backward chaining tries to avoid the
unnecessary path of reasoning.
o In forward-chaining there can be various ASK questions from the knowledge base, whereas in
backward chaining there can be fewer ASK questions.
o Forward chaining is slow as it checks for all the rules, whereas backward chaining is fast as it checks few
required rules only.
1. Forward chaining starts from known facts and Backward chaining starts from the goal and works
applies inference rule to extract more data unit it backward through inference rules to find the required
reaches to the goal. facts that support the goal.
3. Forward chaining is known as data-driven inference Backward chaining is known as goal-driven technique as
technique as we reach to the goal using the we start from the goal and divide into sub-goal to
available data. extract the facts.
4. Forward chaining reasoning applies a breadth-first Backward chaining reasoning applies a depth-first search
search strategy. strategy.
5. Forward chaining tests for all the available rules Backward chaining only tests for few required rules.
6. Forward chaining is suitable for the planning, Backward chaining is suitable for diagnostic, prescription,
monitoring, control, and interpretation application. and debugging application.
7. Forward chaining can generate an infinite number Backward chaining generates a finite number of possible
of possible conclusions. conclusions.
9. Forward chaining is aimed for any conclusion. Backward chaining is only aimed for the required data.
Reasoning:
The reasoning is the mental process of deriving logical conclusion and making predictions from available
knowledge, facts, and beliefs. Or we can say, "Reasoning is a way to infer facts from existing data." It is a
general process of thinking rationally, to find valid conclusions.
In artificial intelligence, the reasoning is essential so that the machine can also think rationally as a human brain,
and can perform like a human.
Types of Reasoning
In artificial intelligence, reasoning can be divided into the following categories:
o Deductive reasoning
o Inductive reasoning
o Abductive reasoning
o Common Sense Reasoning
o Monotonic Reasoning
o Non-monotonic Reasoning
Note: Inductive and deductive reasoning are the forms of propositional logic.
1. Deductive reasoning:
Deductive reasoning is deducing new information from logically related known information. It is the form of
valid reasoning, which means the argument's conclusion must be true when the premises are true.
Deductive reasoning is a type of propositional logic in AI, and it requires various rules and facts. It is sometimes
referred to as top-down reasoning, and contradictory to inductive reasoning.
In deductive reasoning, the truth of the premises guarantees the truth of the conclusion.
Deductive reasoning mostly starts from the general premises to the specific conclusion, which can be explained
as below example.
Example:
2. Inductive Reasoning:
Inductive reasoning is a form of reasoning to arrive at a conclusion using limited sets of facts by the process of
generalization. It starts with the series of specific facts or data and reaches to a general statement or conclusion.
Inductive reasoning is a type of propositional logic, which is also known as cause-effect reasoning or bottom-
up reasoning.
In inductive reasoning, we use historical data or various premises to generate a generic rule, for which premises
support the conclusion.
In inductive reasoning, premises provide probable supports to the conclusion, so the truth of premises does not
guarantee the truth of the conclusion.
Example:
Premise: All of the pigeons we have seen in the zoo are white.
3. Abductive reasoning:
Abductive reasoning is a form of logical reasoning which starts with single or multiple observations then seeks
to find the most likely explanation or conclusion for the observation.
Abductive reasoning is an extension of deductive reasoning, but in abductive reasoning, the premises do not
guarantee the conclusion.
Example:
Conclusion It is raining.
Common sense reasoning is an informal form of reasoning, which can be gained through experiences.
Common Sense reasoning simulates the human ability to make presumptions about events which occurs on
every day.
It relies on good judgment rather than exact logic and operates on heuristic knowledge and heuristic rules.
Example:
1. One person can be at one place at a time.
2. If I put my hand in a fire, then it will burn.
The above two statements are the examples of common sense reasoning which a human mind can easily
understand and assume.
5. Monotonic Reasoning:
In monotonic reasoning, once the conclusion is taken, then it will remain the same even if we add some other
information to existing information in our knowledge base. In monotonic reasoning, adding knowledge does
not decrease the set of prepositions that can be derived.
To solve monotonic problems, we can derive the valid conclusion from the available facts only, and it will not be
affected by new facts.
Monotonic reasoning is not useful for the real-time systems, as in real time, facts get changed, so we cannot
use monotonic reasoning.
Monotonic reasoning is used in conventional reasoning systems, and a logic-based system is monotonic.
Example:
It is a true fact, and it cannot be changed even if we add another sentence in knowledge base like, "The moon
revolves around the earth" Or "Earth is not round," etc.
6. Non-monotonic Reasoning
In Non-monotonic reasoning, some conclusions may be invalidated if we add some more information to our
knowledge base.
Logic will be said as non-monotonic if some conclusions can be invalidated by adding more knowledge into
our knowledge base.
"Human perceptions for various things in daily life, "is a general example of non-monotonic reasoning.
Example: Let suppose the knowledge base contains the following knowledge:
So from the above sentences, we can conclude that Pitty can fly.
However, if we add one another sentence into knowledge base "Pitty is a penguin", which concludes "Pitty
cannot fly", so it invalidates the above conclusion.
o For real-world systems such as Robot navigation, we can use non-monotonic reasoning.
o In Non-monotonic reasoning, we can choose probabilistic facts or can make assumptions.
o In non-monotonic reasoning, the old facts may be invalidated by adding new sentences.
o It cannot be used for theorem proving.
o Deductive reasoning uses available facts, information, or knowledge to deduce a valid conclusion,
whereas inductive reasoning involves making a generalization from specific facts, and observations.
o Deductive reasoning uses a top-down approach, whereas inductive reasoning uses a bottom-up
approach.
o Deductive reasoning moves from generalized statement to a valid conclusion, whereas Inductive
reasoning moves from specific observation to a generalization.
o In deductive reasoning, the conclusions are certain, whereas, in Inductive reasoning, the conclusions
are probabilistic.
o Deductive arguments can be valid or invalid, which means if premises are true, the conclusion must be
true, whereas inductive argument can be strong or weak, which means conclusion may be false even if
premises are true.
The differences between inductive and deductive can be explained using the below diagram on the basis of
arguments:
Comparison Chart:
Definition Deductive reasoning is the form of valid Inductive reasoning arrives at a conclusion by the
reasoning, to deduce new information or process of generalization using specific facts or
conclusion from known related facts and data.
information.
Starts from Deductive reasoning starts from Premises. Inductive reasoning starts from the Conclusion.
Validity In deductive reasoning conclusion must be true In inductive reasoning, the truth of premises does
if the premises are true. not guarantee the truth of conclusions.
Usage Use of deductive reasoning is difficult, as we Use of inductive reasoning is fast and easy, as we
need facts which must be true. need evidence instead of true facts. We often use
it in our daily life.
Argument In deductive reasoning, arguments may be valid In inductive reasoning, arguments may be weak or
or invalid. strong.
Structure Deductive reasoning reaches from general facts Inductive reasoning reaches from specific facts to
to specific facts. general facts.
So to represent uncertain knowledge, where we are not sure about the predicates, we need uncertain reasoning
or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to
indicate the uncertainty in knowledge. In probabilistic reasoning, we combine probability theory with logic to
handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the
result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It will
rain today," "behavior of someone for some situations," "A match between two teams or two players." These are
probable sentences for which we can assume that it will happen but not sure about it, so here we use
probabilistic reasoning.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
As probabilistic reasoning uses probability and related terms, so before understanding probabilistic reasoning,
let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical
measure of the likelihood that an event will occur. The value of probability always remains between 0 and 1 that
represent ideal uncertainties.
We can find the probability of an uncertain event by using the below formula.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world.
Prior probability: The prior probability of an event is probability computed before observing new information.
Posterior Probability: The probability that is calculated after all evidence or information has taken into
account. It is a combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event when another event has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability of A under
the conditions of B", it can be written as:
If the probability of A is given and we need to find the probability of B, then it will be given as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample space will be
reduced to set B, and now we can only calculate event A when event B is already occurred by dividing the
probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes English and
mathematics, and then what is the percent of students those who like English also like mathematics?
Solution:
Hence, 57% are the students who like English also like Mathematics.
In probability theory, it relates the conditional probability and marginal probabilities of two random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is an
application of Bayes' theorem, which is fundamental to Bayesian statistics.
Bayes' theorem allows updating the probability prediction of an event by observing new information of the real
world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the probability of
cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with known event B:
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of hypothesis A when
we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the probability of
evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs 80% of the time. He
is also aware of some more facts, which are given as follows:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the card is
king is 4/52, then calculate posterior probability P(King|Face), which means the drawn face card is a king
card.
Solution:
o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
"A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also
use probability theory for prediction and anomaly detection.
48.7M
836
History of Java
Real world applications are probabilistic in nature, and to represent the relationship between multiple events,
we need a Bayesian network. It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision making under
uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
The generalized form of Bayesian network that represents and solve decision problems under uncertain
knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random
variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of
the network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then
node A is called the parent of Node B.
o Node C is independent of node A.
Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed acyclic
graph or DAG.
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which determines
the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let's first understand
the joint probability distribution:
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
In general for each variable Xi, we can write the equation as:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia, who
have taken a responsibility to inform Harry at work when they hear the alarm. David always calls Harry when he
hears the alarm, but sometimes he got confused with the phone ringing and calls at that time too. On the other
hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we would like to
compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given below. The network structure is showing that
burglary and earthquake is the parent node of the alarm and directly affecting the probability of
alarm's going off, but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary and also do not
notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set
of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2 K probabilities. Hence, if there are two
parents, then CPT will contain 4 probability values
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the above
probability statement using joint probability distribution:
Let's take the observed probability for the Burglary and earthquake component:
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
The Conditional probability of David that he will call depends on the probability of Alarm.
The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."
From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.