0% found this document useful (0 votes)

18 views2 pages

10.Q Learning Algorithm

The document outlines the implementation of a Q-learning algorithm to navigate a grid environment with 16 states and 4 possible actions. It describes the initialization of a Q-table, the learning parameters, and the training process over 1000 epochs using an epsilon-greedy strategy for action selection. The final output is the learned Q-table, which reflects the agent's performance in reaching the goal state.

Uploaded by

nayanabmmtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views2 pages

10.Q Learning Algorithm

Uploaded by

nayanabmmtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

PROGRAM-10

Implement a Q-learning algorithm to navigate a simple grid environment, defining the reward
structure and analyzing agent performance.

import numpy as np

# Define the environment

n_states = 16 # Number of states in the grid world

n_actions = 4 # Number of possible actions (up, down, left, right)

goal_state = 15 # Goal state

# Initialize Q-table with zeros

Q_table = np.zeros((n_states, n_actions))

# Define parameters

learning_rate = 0.8

discount_factor = 0.95

exploration_prob = 0.2

epochs = 1000

# Q-learning algorithm

for epoch in range(epochs):

current_state = np.random.randint(0, n_states) # Start from a random state

while current_state != goal_state:

# Choose action with epsilon-greedy strategy

if np.random.rand() < exploration_prob:

action = np.random.randint(0, n_actions) # Explore

else:

action = np.argmax(Q_table[current_state]) # Exploit

# Simulate the environment (move to the next state)

# For simplicity, move to the next state

next_state = (current_state + 1) % n_states

# Define a simple reward function (1 if the goal state is reached, 0 otherwise)

reward = 1 if next_state == goal_state else 0

# Update Q-value using the Q-learning update rule

Q_table[current_state, action] += learning_rate * \

(reward + discount_factor *

np.max(Q_table[next_state]) - Q_table[current_state, action])

current_state = next_state # Move to the next state

# After training, the Q-table represents the learned Q-values

print("Learned Q-table:")

print(Q_table)

Learned Q-table:
[[0.48767498 0.48751892 0.48751892 0.46816798]
[0.51334208 0.51330923 0.51334207 0.50923535]
[0.54036009 0.5403255 0.54036003 0.5403587 ]
[0.56880009 0.56880009 0.56880008 0.56880009]
[0.59873694 0.59873694 0.59873694 0.59873694]
[0.63024941 0.63024941 0.63024941 0.63024941]
[0.66342043 0.66342043 0.66342043 0.66342043]
[0.6983373 0.6983373 0.6983373 0.6983373 ]
[0.73509189 0.73509189 0.73509189 0.73509189]
[0.77378094 0.77378094 0.77378094 0.77378094]
[0.81450625 0.81450625 0.81450625 0.81450625]
[0.857375 0.857375 0.857375 0.857375 ]
[0.9025 0.9025 0.9025 0.9025 ]
[0.95 0.95 0.95 0.95 ]
[1. 1. 1. 1. ]
[0. 0. 0. 0. ]]

Class-Work-1 (26-08-2024)
No ratings yet
Class-Work-1 (26-08-2024)
5 pages
Ass1 Merged Merged
No ratings yet
Ass1 Merged Merged
19 pages
Frozen Lake
No ratings yet
Frozen Lake
6 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
FrozenLake Q-Learning Guide
No ratings yet
FrozenLake Q-Learning Guide
4 pages
Ass1 Merged Merged
No ratings yet
Ass1 Merged Merged
15 pages
Ex No4rl
No ratings yet
Ex No4rl
3 pages
Q-Learning in RL With Openai Gym: Joo Soon Lee
No ratings yet
Q-Learning in RL With Openai Gym: Joo Soon Lee
34 pages
Exam
No ratings yet
Exam
7 pages
Implement The KNN
No ratings yet
Implement The KNN
5 pages
Intelligent Optimization Algorithm For Master
No ratings yet
Intelligent Optimization Algorithm For Master
47 pages
CVDL (Practical No. 3)
No ratings yet
CVDL (Practical No. 3)
1 page
1 - All Python Codes + Neo4j Samples
No ratings yet
1 - All Python Codes + Neo4j Samples
16 pages
Tanu Raman ML Lab File
No ratings yet
Tanu Raman ML Lab File
21 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
RL Unit V Qa
No ratings yet
RL Unit V Qa
13 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
FL QL
No ratings yet
FL QL
5 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
RLAI Lab 1 Rahel Benjamin
No ratings yet
RLAI Lab 1 Rahel Benjamin
16 pages
Q Learning
No ratings yet
Q Learning
6 pages
21L7734 Shais Quiz3 Aml 8A
No ratings yet
21L7734 Shais Quiz3 Aml 8A
25 pages
Unit 5
No ratings yet
Unit 5
39 pages
Q-Learning for Optimal Pathfinding
No ratings yet
Q-Learning for Optimal Pathfinding
2 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
Treasure Island MDP Using Value Iteration: Python Code
No ratings yet
Treasure Island MDP Using Value Iteration: Python Code
5 pages
Program Explanation
No ratings yet
Program Explanation
37 pages
Practical
No ratings yet
Practical
6 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
ML Assignment 3
No ratings yet
ML Assignment 3
11 pages
Heuristic Search
No ratings yet
Heuristic Search
8 pages
AIML Final Programs
No ratings yet
AIML Final Programs
8 pages
Unit 5
No ratings yet
Unit 5
65 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
ML - 6 - Jupyter Notebook
No ratings yet
ML - 6 - Jupyter Notebook
5 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Q Learning
No ratings yet
Q Learning
9 pages
Lecture Doubts
No ratings yet
Lecture Doubts
2 pages
Exp1 D16AD 60
No ratings yet
Exp1 D16AD 60
11 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
39-Q Learning Numerical
No ratings yet
39-Q Learning Numerical
13 pages
Lab-5 Report
No ratings yet
Lab-5 Report
11 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
Reinforcement Ch.2
No ratings yet
Reinforcement Ch.2
77 pages
Practical No4,5
No ratings yet
Practical No4,5
7 pages
RL Exp 5
No ratings yet
RL Exp 5
2 pages
RLDL
No ratings yet
RLDL
23 pages
Algorithms To Solve An MDP
No ratings yet
Algorithms To Solve An MDP
24 pages
01 Module 1 Early Reinforcement Learning
No ratings yet
01 Module 1 Early Reinforcement Learning
134 pages
Ai Unit 2
No ratings yet
Ai Unit 2
4 pages
CS3491 - Ai & ML University Practical Questions
No ratings yet
CS3491 - Ai & ML University Practical Questions
16 pages
Ex No2 RL
No ratings yet
Ex No2 RL
3 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Adobe Scan Nov 18, 2024
No ratings yet
Adobe Scan Nov 18, 2024
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Assignment Week6 AI4ICPS
No ratings yet
Assignment Week6 AI4ICPS
11 pages
MarkovDecisionProcesses Analysis
No ratings yet
MarkovDecisionProcesses Analysis
10 pages
AIML Lab
No ratings yet
AIML Lab
42 pages
3 Model of Distributed System Architecture
No ratings yet
3 Model of Distributed System Architecture
2 pages
Electronic Mail
No ratings yet
Electronic Mail
3 pages
5 Backpropagation
No ratings yet
5 Backpropagation
2 pages
End-to-End Protocols - Deep Dive: 1. Simple Demultiplexer (UDP)
No ratings yet
End-to-End Protocols - Deep Dive: 1. Simple Demultiplexer (UDP)
9 pages
8 - K-Nearest Neighbor Algorithm
No ratings yet
8 - K-Nearest Neighbor Algorithm
2 pages
Report PDF
No ratings yet
Report PDF
94 pages
Ubi Comp
No ratings yet
Ubi Comp
5 pages
Rohit Chaudhary and Ors Vs Vipul LTD 06092023 SCSC2023070923171710333COM600424
No ratings yet
Rohit Chaudhary and Ors Vs Vipul LTD 06092023 SCSC2023070923171710333COM600424
10 pages
DofE Bronze Initial Letter 2024
No ratings yet
DofE Bronze Initial Letter 2024
2 pages
Network Access Methods-1
No ratings yet
Network Access Methods-1
36 pages
Python Quick Reference
100% (1)
Python Quick Reference
3 pages
Lvin Jamerlan: Key Skills
No ratings yet
Lvin Jamerlan: Key Skills
4 pages
2276277b-Exercise A2 AP1
No ratings yet
2276277b-Exercise A2 AP1
2 pages
Globalization and Insecurity Notes
No ratings yet
Globalization and Insecurity Notes
5 pages
Rs232 Control: How To Find Audia/Nexia Ip Settings Using Hyperterminal
No ratings yet
Rs232 Control: How To Find Audia/Nexia Ip Settings Using Hyperterminal
4 pages
Dr. Beckmann Starch SDS 2014
No ratings yet
Dr. Beckmann Starch SDS 2014
7 pages
Hydrologic Model Proposal
No ratings yet
Hydrologic Model Proposal
95 pages
4th Form Agricultural Science Guide
No ratings yet
4th Form Agricultural Science Guide
29 pages
From Hope To Haunt
No ratings yet
From Hope To Haunt
27 pages
A Research Paper On Work Life Balance of Employees at Sbi: With Special Reference To Bhavnagar City
100% (1)
A Research Paper On Work Life Balance of Employees at Sbi: With Special Reference To Bhavnagar City
4 pages
Recommendations For Gas Engine Spark Plugs
No ratings yet
Recommendations For Gas Engine Spark Plugs
5 pages
Nen Dynamic Ground Properties and Ec8 2024 12 11 Start
No ratings yet
Nen Dynamic Ground Properties and Ec8 2024 12 11 Start
11 pages
110-113 GCC - Milestone
No ratings yet
110-113 GCC - Milestone
4 pages
Setup of SM 100 For React Solution
No ratings yet
Setup of SM 100 For React Solution
4 pages
Conceptualising The Digital University: The Intersection of Policy, Pedagogy and Practice Bill Johnston PDF Download
100% (3)
Conceptualising The Digital University: The Intersection of Policy, Pedagogy and Practice Bill Johnston PDF Download
60 pages
Sony Str-k7100 Ver1.0
No ratings yet
Sony Str-k7100 Ver1.0
74 pages
List & Bundle of The Claimant's Documents-City Interscope Claim
No ratings yet
List & Bundle of The Claimant's Documents-City Interscope Claim
29 pages
Atri FL SEM 2
No ratings yet
Atri FL SEM 2
17 pages
The Ultimate Guide To GOURI 2024 - BIOHACKING BEAUTY
No ratings yet
The Ultimate Guide To GOURI 2024 - BIOHACKING BEAUTY
35 pages
Awb 57-017 Issue 1 - Piper Pa-28 and Pa-32 Wing Spar Fatigue
No ratings yet
Awb 57-017 Issue 1 - Piper Pa-28 and Pa-32 Wing Spar Fatigue
4 pages
Non-Developers' Subdivision Guide
No ratings yet
Non-Developers' Subdivision Guide
33 pages
Thesis SW
No ratings yet
Thesis SW
17 pages
Urn Uvci 01 Ro 230dmyqx5erjr4dg9xj719kv6wop8l#
No ratings yet
Urn Uvci 01 Ro 230dmyqx5erjr4dg9xj719kv6wop8l#
2 pages
Maruti Suzuki Brand Audit
No ratings yet
Maruti Suzuki Brand Audit
3 pages
Marketing Aspects: Bruce R. Barringer R. Duane Ireland
No ratings yet
Marketing Aspects: Bruce R. Barringer R. Duane Ireland
31 pages
Planar Corporate Overview Presentation - Q2 2024
No ratings yet
Planar Corporate Overview Presentation - Q2 2024
68 pages
A Mean-Risk Index Model For Uncertain Capital Budgeting
No ratings yet
A Mean-Risk Index Model For Uncertain Capital Budgeting
10 pages

10.Q Learning Algorithm

Uploaded by

10.Q Learning Algorithm

Uploaded by

PROGRAM-10

# Define the environment

n_states = 16 # Number of states in the grid world

n_actions = 4 # Number of possible actions (up, down, left, right)

goal_state = 15 # Goal state

# Initialize Q-table with zeros

Q_table = np.zeros((n_states, n_actions))

for epoch in range(epochs):

current_state = np.random.randint(0, n_states) # Start from a random state

while current_state != goal_state:

# Choose action with epsilon-greedy strategy

if np.random.rand() < exploration_prob:

action = np.random.randint(0, n_actions) # Explore

action = np.argmax(Q_table[current_state]) # Exploit

# Simulate the environment (move to the next state)

# For simplicity, move to the next state

next_state = (current_state + 1) % n_states

# Define a simple reward function (1 if the goal state is reached, 0 otherwise)

reward = 1 if next_state == goal_state else 0

# Update Q-value using the Q-learning update rule

Q_table[current_state, action] += learning_rate * \

np.max(Q_table[next_state]) - Q_table[current_state, action])

current_state = next_state # Move to the next state

# After training, the Q-table represents the learned Q-values

You might also like