[go: up one dir, main page]

0% found this document useful (0 votes)
44 views2 pages

21cse417t - Fundamentals of Reinforcement Learning Syllabus

Uploaded by

ratakondak253
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views2 pages

21cse417t - Fundamentals of Reinforcement Learning Syllabus

Uploaded by

ratakondak253
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Course Course Course L T P C

21CSE417T REINFORCEMENT LEARNING TECHNIQUES E PROFESSIONAL ELECTIVE


Code Name Category 2 1 0 3

Pre-requisite Co- requisite Progressive


Nil Nil Nil
Courses Courses Courses
Course Offering Department School of Computing Data Book / Codes / Standards Nil

Course Learning Rationale (CLR): The purpose of learning this course is to: Program Outcomes (PO) Program
Specific
CLR-1: introduce the fundamentals of Reinforcement Learning 1 2 3 4 5 6 7 8 9 10 11 12 Outcomes
CLR-2: illustrate model-based prediction and control using dynamic programming

Individual & Team Work


Engineering Knowledge

Design/development of

Project Mgt. & Finance


Conduct investigations
of complex problems
CLR-3: illustrate model-free prediction and control

Modern Tool Usage

Life Long Learning


The engineer and
Problem Analysis

Communication
CLR-4: introduce planning and learning with tabular methods

Environment &
Sustainability
CLR-5: explain approximation of a value function

solutions

society

PSO-1

PSO-2

PSO-3
Ethics
Course Outcomes (CO): At the end of this course, learners will be able to:
CO-1: understand basic concepts of reinforcement learning 3 2 - 2 - - - - - - - - - - 2
CO-2: perform model-based prediction and control using dynamic programming 3 3 - 3 - - - - - - - - - - 2
CO-3: apply model-free prediction and control 3 3 - 3 - - - - - - - - - - 3
CO-4: comprehend the use of tabular methods 3 3 - 3 - - - - - - - - - - 3
CO-5: understand how a value function can be approximated 3 3 - 3 - - - - - - - - - - 3

Unit-1 - Introduction 9 Hour


Introduction to Reinforcement learning, examples - Elements of reinforcement learning - Limitations and Scope- An extended example - multi-armed bandits - k-armed bandit problem - action-value methods - the
10-armed testbed - incremental implementation - tracking a nonstationary problem - optimistic initial values - upper-confidence-bound action selection - associative search (contextual bandits)
T1: Implementing the 10-armed testbed
T2: Comparing performance for different values
T3: Upper-confidence bound action selection performance comparison with –greedy
Unit-2 - Markov Decision Process and Model-Based Prediction and Control 9 Hour
Finite Markov Decision Process - The Agent–Environment Interface - Goals and Rewards - Returns and Episodes - Unified Notation for Episodic and Continuing Tasks - Policies and Value Functions - Optimal
Policies and Optimal Value Functions - Optimality and Approximation - Dynamic Programming - Policy Evaluation (Prediction) - Policy Improvement - Policy Iteration - Value Iteration - Generalized Policy Iteration -
Efficiency of Dynamic Programming - Asynchronous Dynamic Programming
T4: MDP for Recycling Robot
T5: Policies and value functions for Gridworld example
T6: Policy evaluation for Gridworld example

139
B.Tech / M.Tech (Integrated) Programmes-Regulations 2021-Volume-11-CSE-Higher Semester Syllabi-Control Copy
Unit-3 - Model-Free Prediction and Control 9 Hour
Model-free learning - Model-free prediction - Monte Carlo methods - Monte Carlo Prediction - Monte Carlo Estimation of Action Values - Temporal-Difference Learning - TD Prediction - Advantages of TD Prediction
Methods - Optimality of TD(0) - n-step Bootstrapping - n-step TD Prediction - n-step Sarsa - Model-free control - Monte Carlo Control - Monte Carlo Control without Exploring Starts - Off policy learning - Importance
sampling - Off-policy Monte Carlo Control - Sarsa: On-policy TD Control - Q-learning: Off-policy TD control
T7: Monte Carlo Policy Evaluation for Blackjack
T8: TD Prediction for Driving Home example
T9: Sarsa vs Q-learning using Cliff Walking example
Unit-4 - Planning and Learning with Tabular Methods 9 Hour
Models and planning - Dyna: Integrated Planning, Acting and Learning - When the model is wrong - Prioritized Sweeping - Real-time Dynamic Programming - Monte Carlo Tree Search
T10: Simple maze using Dyna-Q
T11: Prioritized sweeping on Maze example
T12: Real-time Dynamic Programming for Racetrack example
Unit-5 - Value Function Approximation 9 Hour
On-policy Prediction with Approximation - Value Function Approximation - The Prediction Objective (VE) - Stochastic-gradient and Semi-gradient Methods - Linear Methods - Least-Squares TD
T13: State aggregation on the 1000-state Random Walk
T14: Bootstrapping on the 1000-state Random Walk
T15: Least squares TD example

1. Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An 3. Artificial Intelligence: A Modern Approach, Stuart J. Russell and Peter Norvig, 3rd edition, Pearson, 2015.
Learning introduction, 2nd edition, The MIT Press, 2015. 4. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press Ltd., 2016.
Resources 2. Martijn van Otterlo, Marco Wiering, Reinforcement Learning: State-of-the-Art, 5. https://deepmind.com/learning-resources/-introduction-reinforcement-learning-david-silver
Springer-Verlag Berlin Heidelberg, 2012. 6. Reinforcement Learning with MATLAB, MathWorks Inc., 2020.

Learning Assessment
Continuous Learning Assessment (CLA)
Summative
Formative Life-Long Learning
Bloom’s Final Examination
CLA-1 Average of unit test CLA-2
Level of Thinking (40% weightage)
(50%) (10%)
Theory Practice Theory Practice Theory Practice
Level 1 Remember 40% - 40% - 40% -
Level 2 Understand 40% - 40% - 40% -
Level 3 Apply 20% - 20% - 20% -
Level 4 Analyze - - - - - -
Level 5 Evaluate - - - - - -
Level 6 Create - - - - - -
Total 100 % 100 % 100 %

Course Designers
Experts from Industry Experts from Higher Technical Institutions Internal Experts
1. Mr. Ghulam Ahmed Ansari, Applied Research Engineer, LinkedIn 1. Dr. Manikantan Srinivasan, , Adjunct Faculty, CSE, IIT Madras 1. Dr. Saad Y. Sait, SRMIST

140
B.Tech / M.Tech (Integrated) Programmes-Regulations 2021-Volume-11-CSE-Higher Semester Syllabi-Control Copy

You might also like