- Study Chapters 1-9 of Sutton & Barto ("Reinforcement Learning: An Introduction").
- Solve at least half of the exercises to solidify your grasp on the fundamentals.
Go beyond theoretical understanding—read core papers and implement these algorithms yourself:
- DQN
- RAINBOW DQN (focus on Double-Q learning and competing Q-values)
- REINFORCE (based on Lillog post, not just the original paper)
- A2C (implement fully, read A3C paper)
- PPO (read TRPO paper, but don’t implement TRPO; also read the ICLR blog post on PPO choices)
- DDPG
- TD3
- SAC
These topics help expand understanding beyond just implementing standard algorithms:
- HER (Hindsight Experience Replay)* → Implement
- Inverse RL → Read 2-3 key papers
- World Models Paper* → Implement
- RIAL & DIAL (J. Foerster) → Read
- MADDPG* → Implement
- Intrinsic Motivation → Read Pathak’s two papers
- Offline RL → Watch Sergey Levine’s talk
- Conservative Q-learning → Read
Once you have a solid foundation, explore these:
- Mirror Learning
- AlphaGo → Learn MCTS (Monte Carlo Tree Search)
- RLHF (Reinforcement Learning from Human Feedback)
- IMPALA