Reinforcement Learning
Prerequisite (02)
Probability distributions and expected values,
basic linear algebra (e.g., inner products).
1 Introduction to Reinforcement 4 Dynamic Programming: 07
Learning: (04) Policy Evaluation (Prediction),
Reinforcement Learning: Policy Improvement,
Key features and Elements of RL, Policy Iteration, Value Iteration,
Types of RL, rewards. Asynchronous Dynamic Programming,
Algorithms: Q-Learning, Generalized Policy Iteration
State Action Reward State action (SARSA),
5 Monte Carlo Methods and Temporal-
2 Bandit problems and online learning: Difference Learning 07
07 Monte Carlo Prediction,
An n-Armed Bandit Problem, Monte Carlo Estimation of Action Values,
Action-Value Methods Monte Carlo Control,
Tracking a Nonstationary Problem, TD Prediction, TD control using Q-Learning
Optimistic Initial Values
Upper-Confidence-Bound Action Selection 6 Applications and Case Studies 05
Gradient Bandits Elevator Dispatching,
Dynamic Channel Allocation,
3 Markov Decision Processes: 07 Job-Shop Scheduling
The Agent–Environment Interface,
Goals and Rewards, Returns, Reinforcement Learning - YouTube
Markov properties, Reinforcement Learning ( Machine Learning ) -
Markov Decision Process, YouTube
Value Functions
Optimal Value Functions,