Dlvu Lecture09
Dlvu Lecture09
dlvu.github.io
environment policy ⇡✓ This loop happens for every timestep t. The environment presents a state
<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>
-
learner
s_t to the policy \pi_\theta, which then chooses an action a_t. The
environment does some magic, and chooses the next state s_{t+1}. It also
returns a reward r_{t+1} for that time step, which could be received
state st
<latexit sha1_base64="/pzW3Mv4fuxhZwPDsc7BaKD9MQQ=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuhN3z/etQyu/zPrA3gz2jc11eveitT+ZEhiHKBIQA85vbIuK2wQwEUCM0r1JzBEF8B746CYWs95tEkQ0FiiCqflSarMYm4KYGZM5DRiCAq/kAEAWyAQTzgEDUEjyvWoURxEIET+YLgLK10O+8NcDAeRt3ybLvCxpZWHiM0DnAVxWyBIQ8hCIeW2Sr0KvOolijNgirE5mlJJRcS4RgwHPanAqC/OeZpXm5+R0o89XdI4iniYxw2l5oRQQY2gmF+ZDjkRMk/xm5OO9568Ei9FBNsznXg0Buz9D0wOZU5mo4swwAaI65YVZcSL0AEkYgmiaTGiaTARaimRycJhK8aXpYWn2CGDTqvMsTZJJVjPPM8+ktSK+K4nvVHFUEkebHyF4as4IMxfy+RPGTWk0pYUFEPHq6oti9cy8UKMvS+KlKl6VxCtV9OKSGtfURUld1NSHkvqgqsuSuFTFVUlcqeKHkvhBFa93xf6yK/ZXJVbWX26K1d5kimby65G/Qsk9TJM3529/SpN+fm3ehRiZdtUIvUfj8bjT7XdTVcaPenvcs51hXS8MXWdguzpD4Rh0XWs42rIcKd4C2rI6o0FHjYJ4q/f67riub2GtQXfoaAxb2rHbHnU35UMoUqx+4XP7nXatLH6R03cc56Rf1wuD03bd3pHGUDjc4XA4cHMUGjOKkeKlj8ZO58SqR9EiqGd12gONvn0AVs9xarC0xOKMHXto5ywCAaw4RfGyuL1Bf6QGiW39nYHr1h6gKJW/59rDtsawZT0ZnoyOcxLCQOSrVSFF+Trd3nFPjSJF0Lgrn2CNhWx/adx3rG6NhZRYxo7rZIWt7kS5yW7s22T9yd3uO3PfNtNUNcuNppqzvfdoVrxYY8bNbq19l1+/ADfC1+8UwqZ4qAmHjTBQxwKb4aEWHu6C9+t+vyne14T7jTC+jsVvhve18P4ueFr306Z4qgmnjTBUx0Kb4akWnu6CF3W/aIoXmnDRCCN0LKIZXmjhxS54UveTpniiCSeNMETHQprhiRae7IBn6EG2fFmnIN+uhNUiZU8uu9lchzgBNT0/T+QywQmvyZ6YIwEy3QslU/5P3QPiwiGHqj4NOCRxJDaQGCcTH0gtXfc0NDuCAGxmLTzBZhBtupzKB5hmi++hbHzX7nUhhkgeZRh6K3uk97L/BrIl/VHeMvPDQN6y/Ds5yEa7jGD5aJSjPXmsstVDVH1wdXRotw9t++f2/uvO5oT1zPjO+N74wbCNrvHaeGOcGhcGNLDxm/G78Ufrz9bfrX9a/66tT59s1nxrVK7Wf/8Db7dJIA==</latexit>
because the agent achieves some goal in the environment. This reward is
then used in the learner to update the policy parameters.
reward rt
<latexit sha1_base64="JCho6RHVaZ12zLSSoSGaNUMc/P8=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq65B8+Dx//9JcokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9hKF7wKa3/PbZvnVoFZepFvam2Dc21+nt89b+ZEpgEqKIQwzi+INtUX6TAsYDiFG2N0liRAG8Az76kPBZ7yYNIppwFMHMfCG0WYJNTswcypwGDEGOV6IAkAUiwYRzwADkAn2vHhWjCIQoPpguAhqvy3jhrwsOxH3fpMuiL1ltYuozQOcBXNbIUhDGIeBzZTBehV59ECUYsUVYH8wpBaPkXCIGgzjvwalozDuatzo+J6cbfb6icxTFWZownFUnCgExhmZiYlHGiCc0LW5GPN+7+CVnCTrIy2Ls5RCwuzM0PRA5tYE6zgwTwOtDXpg3J0L3kIQhiKbphGbphKMlTycHh5kQX5geFmaPiLej7jzL0nSS98zzzDNhrYlvK+JbWRxVxNHmRwiemjPCzIV4/oTFpjCawsICiOL67Ity9sy8kKMvK+KlLF5VxCtZ9JKKmijqoqIuFPW+ot7L6rIiLmVxVRFXsvixIn6Uxetdse93xf4qxYr+i0Wx2ptM0Ux8PopXKL2DWfr6/M3PWdovrs27kCDTrhuh92A8Hne6/W4my/hBb497tjNU9dLQdQa2qzOUjkHXtYajLcuR5C2hLaszGnTkKIi3eq/vjlV9C2sNukNHY9jSjt32qLtpH0KRZPVLn9vvtJW2+GVO33Gck76qlwan7bq9I42hdLjD4XDgFig0YRQjyUsfjJ3OiaVG0TKoZ3XaA42+fQBWz3EUWFphccaOPbQLFo4Alpy8fFnc3qA/koP4tv/OwHWVB8gr7e+59rCtMWxZT4Yno+OChDAQ+XJXSNm+Trd33JOjSBk07oonqLCQ7S+N+47VVVhIhWXsuE7e2PpKFIvsg32Trj+523Vn7ttmlslmsdBkc772HsySF2vMuNmtte/y6yfgRnj1TiFsioeacNgIA3UssBkeauHhLnhf9ftN8b4m3G+E8XUsfjO8r4X3d8FT1U+b4qkmnDbCUB0LbYanWni6C56rft4UzzXhvBGG61h4MzzXwvNd8ET1k6Z4ogknjTBEx0Ka4YkWnuyAXx8I8p2CeLtSpkSKPbnYzRY6xClQ9JgDjgqZ4DRWZI/PEQe57oWCqfhH9YCkdIhS1qdBDEkS8Q0kxunEB0LL1nsamh9BADbzLTzBZhBtdjm1DzDNJ99BsfFdu9eNGCJxlGHojdgjvRP7byC2pD+JW2Z+GIhbFn8nB3m1ywiWD0ZR7YljlS0fotTi6ujQbh/a9i/t/VedzQnrifG98YPxo2EbXeOV8do4NS4MaITGb8bvxh+tP1t/t/5p/bu2Pn60mfOdUbta//0PDH9Jjg==</latexit>
3
Here, we present an example of an RL environment: Cart pole balancing. Our
CART POLE agent is the funny object on the left of the screen, a cart, has to balance the
wooden stick, the pole, so that it remains upright.
le? or right
To encode the state, it is sufficient to pass the angle of the pole, though
additional features can be thought of! Our agent uses its policy to decide
whether the cart should be moved to the left or the right, so the pole remains
physics upright.
policy ⇡✓
<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>
learner
engine It receives a positive reward if the pole is upright, and otherwise receives no
reward. The reward is used in the learner to update the parameters.
angle of pole
We will also only look at model-free methods for now. There are also model-
based methods to RL, where in addition to learning the policy, we also learn a
model of the environment! This is also an exciting and active research area.
This lecture will be introducing RL and explaining the basic Deep RL algorithm,
while the second lecture on RL, lecture 12) will focus on more recent popular
methods for Deep RL.
One benefit of RL is that a single system can be developed for
DEEPMIND ATARI
many different tasks, so long as the interface between the world
and the learner stays the same. Here is a famous experiment by
DeepMind, the company behind AlphaGo. The environment is an
Atari simulator. The state is a single image, containing everything
that can be seen on the screen. The ackons are the four possible
movements of the joyskck and the pressing of the fire bulon. The
reward is determined by the score shown on the screen.
The amazing thing here is that the system was not pre-
programmed with any knowledge of any of the games. For several
of the games the system learned play the game beler than the top
h"ps://www.youtube.com/
human performance.
6 source: h"ps://www.youtube.com/watch?v=V1eYniJ0Rnk
watch?v=VCdxqn0fcnE
Here, we see an example of what a Deep RL policy might look like. Like we
ATARI POLICY mentioned, the states in our atari game are simple images, so, a sensible idea
is to use a CNN! This CNN policy takes the current state of the game, does a
lot of hard neural network computation, and computes a probability
distribution over actions! Since we have a finite set of actions, we can use a
softmax output layer to create a categorical distribution over actions. Then,
we sample an action to perform from this distribution.
We won’t discuss policy network architectures any further in this lecture. For
Left choosing network architecture, generally the same recommendations apply
as for normal deep learning: Use CNNs for states represented as images,
s
<latexit sha1_base64="waUnr2Aq8dArVtVjOAboSaAlUwM=">AAAOHnicfZdNb9s2HMaV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8rbVQUHRtKyFEgmScuwK+g67bpd9mh2HXbdvM0p2ZImirEsYPg8f//inKJAexQEXlvXf3qOPPv7k088ef77/xZetr548ffb1FScxg+gSEkzYjQc4wkGELkUgMLqhDIHQw+jau3Mz/XqBGA9IdCFWFN2GwI+CWQCBkF1XEy6AQO+fHlhHVv6Y9Ya9aRwYm+fs/bPW3mRKYByiSEAMOH9nW1TcJoCJAGKU7k9ijiiAd8BH72Ix690mQURjgSKYms+lNouxKYiZEZnTgCEo8Eo2AGSBTDDhHDAAheTer0ZxFIEQ8cPpIqB83eQLf90QQE76NlnmRUkrAxOfAToP4LJCloCQh0DMa518FXrVThRjxBZhtTOjlIyKc4kYDHhWgzNZmLc0qzO/IGcbfb6icxTxNIkZTssDpYAYQzM5MG9yJGKa5JORi3vHXwgWo8Osmfe9GAJ2d46mhzKn0lHFmWECRLXLC7PiROgekjAE0TSZ0DSZCLQUyeTwKJXic9PD0uwRwKZV53maJJOsZp5nnktrRXxTEt+o4qgkjjY/QvDUnBFmLuT6E8ZNaTSlhQUQ8eroy2L0zLxUo69K4pUqXpfEa1X04pIa19RFSV3U1PuSeq+qy5K4VMVVSVyp4oeS+EEVb3bF/rwr9hclVtZfborV/mSKZvLbkb9CyR1Mk1cXr39Mk37+bN6FGJl21Qi9B+PJuNPtd1NVxg96e9yznWFdLwxdZ2C7OkPhGHRdazjashwr3gLasjqjQUeNgnir9/ruuK5vYa1Bd+hoDFvasdsedTflQyhSrH7hc/uddq0sfpHTdxzntF/XC4PTdt3escZQONzhcDhwcxQaM4qR4qUPxk7n1KpH0SKoZ3XaA42+XQCr5zg1WFpiccaOPbRzFoEAVpyieFnc3qA/UoPEtv7OwHVrCyhK5e+59rCtMWxZT4eno5OchDAQ+WpVSFG+Trd30lOjSBE07soVrLGQ7S+N+47VrbGQEsvYcZ2ssNWdKDfZO/s2WX9yt/vOPLDNNFXNcqOp5mzvPZgVL9aYcbNba9/l1w/AjfD1mULYFA814bARBupYYDM81MLDXfB+3e83xfuacL8Rxtex+M3wvhbe3wVP637aFE814bQRhupYaDM81cLTXfCi7hdN8UITLhphhI5FNMMLLbzYBU/qftIUTzThpBGG6FhIMzzRwpMd8AzdyyNfdlKQb1fCapHyTC5Ps7kOcQJqen6byGWCE16TPTFHAmS6F0qm/B/VMw04JHEk8hSKk4kPpJKuTyw0u2AAbGYHdILNINqcYSqfV5oNvYPyWLt2r6c5RPKiwtBreQJ6K0/XQB44f5ATYn4YyAnJv5PDrLXLCJYPRtnal5cmW70i1RvXx0d2+8i2f2ofvOxs7k+PjW+N74zvDdvoGi+NV8aZcWlA41fjN+N344/Wn62/Wn+3/llbH+1txnxjVJ7Wv/8DASs5Vg==</latexit>
a
<latexit sha1_base64="tu4dC4JjNNVWy9U2qYJ6CpDY5rg=">AAAOH3icfZdNb9s2HMaV7q3L5q3djrsICwoMQxBIieOXQ4Fako0e1jYLkjhbHBQUTcuCKZGgKMeuoA+x63bZp9lx2LXfZpTsyBJFWRfRfB4+/vEvUSBdiv2IG8bHgyeffPrZ5188/fLwq69b33z77Pl3NxGJGUTXkGDCbl0QIeyH6Jr7HKNbyhAIXIzG7sLO9PESscgn4RVfU3QfAC/0Zz4EXHSNJwBm9/fPjowTI7/0esPcNo607XXx/nnrYDIlMA5QyCEGUXRnGpTfJ4BxH2KUHk7iCFEAF8BDdzGf9e4TP6QxRyFM9RdCm8VY50TPkPSpzxDkeC0aADJfJOhwDphAE+CH1agIhSBA0fF06dNo04yW3qbBgZj1fbLKq5JWBiYeA3Tuw1WFLAFBFAA+r3VG68CtdqIYI7YMqp0ZpWCUnCvEoB9lNbgQhXlHswJHV+Riq8/XdI7CKE1ihtPyQCEgxtBMDMybEeIxTfLJiKe7iF5yFqPjrJn3vXQAW1yi6bHIqXRUcWaYAF7tcoOsOCF6gCQIQDhNJjRNJhyteDI5PkmF+EJ3sTC7BLBp1XmZJskkq5nr6pfCWhHflsS3sjgsicPtnxA81WeE6Uvx/AmLdGHUhYX5EEXV0dfF6Jl+LUfflMQbWRyXxLEsunFJjWvqsqQua+pDSX2Q1VVJXMniuiSuZfFDSfwgi7f7Yn/bF/u7FCvqLxbF+nAyRTPx8chfoWQB0+T11Ztf0qSfX9t3IUa6WTVC99F4Nup0+91UlvGj3h71TMup64Whaw1MW2UoHIOubTjDHcup5C2gDaMzHHTkKIh3eq9vj+r6DtYYdB1LYdjRjuz2sLstH0KhZPUKn93vtGtl8YqcvmVZ5/26Xhistm33ThWGwmE7jjOwcxQaM4qR5KWPxk7n3KhH0SKoZ3TaA4W+ewBGz7JqsLTEYo0s0zFzFo4Alpy8eFns3qA/lIP4rv7WwLZrD5CXyt+zTaetMOxYz53z4VlOQhgIPbkqpChfp9s768lRpAgadcUTrLGQ3T+N+pbRrbGQEsvIsq2ssNWVKBbZnXmfbD65u3WnH5l6mspmsdBkc7b2Hs2SFyvMuNmttO/zqwfgRvj6TCFsioeKcNgIA1UssBkeKuHhPniv7vea4j1FuNcI46lYvGZ4Twnv7YOndT9tiqeKcNoIQ1UstBmeKuHpPnhe9/OmeK4I540wXMXCm+G5Ep7vgyd1P2mKJ4pw0ghDVCykGZ4o4ckeeIYexJYv2ymItythtcjNcSHXIU5ATY844CiXCU6imuzyOeIg091AMOU/ZM/UjyCJQ56nUJxMPCCUdLNjodkBA2A926ATrPvhdg9T+bzSbOgCim3txr2ZpoPEQYWhN2IH9E7sroHYcP4sJsS8wBcTEvfJcdbaZwSrR6NoHYpDkykfkeqN8emJ2T4xzV/bR6862/PTU+0H7UftJ83Uutor7bV2oV1rUFtof2h/an+1/m790/q39d/G+uRgO+Z7rXK1Pv4PL2I5vQ==</latexit>
interacting with the environment and receiving rewards. It starts at the initial
• States st
<latexit sha1_base64="/pzW3Mv4fuxhZwPDsc7BaKD9MQQ=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuhN3z/etQyu/zPrA3gz2jc11eveitT+ZEhiHKBIQA85vbIuK2wQwEUCM0r1JzBEF8B746CYWs95tEkQ0FiiCqflSarMYm4KYGZM5DRiCAq/kAEAWyAQTzgEDUEjyvWoURxEIET+YLgLK10O+8NcDAeRt3ybLvCxpZWHiM0DnAVxWyBIQ8hCIeW2Sr0KvOolijNgirE5mlJJRcS4RgwHPanAqC/OeZpXm5+R0o89XdI4iniYxw2l5oRQQY2gmF+ZDjkRMk/xm5OO9568Ei9FBNsznXg0Buz9D0wOZU5mo4swwAaI65YVZcSL0AEkYgmiaTGiaTARaimRycJhK8aXpYWn2CGDTqvMsTZJJVjPPM8+ktSK+K4nvVHFUEkebHyF4as4IMxfy+RPGTWk0pYUFEPHq6oti9cy8UKMvS+KlKl6VxCtV9OKSGtfURUld1NSHkvqgqsuSuFTFVUlcqeKHkvhBFa93xf6yK/ZXJVbWX26K1d5kimby65G/Qsk9TJM3529/SpN+fm3ehRiZdtUIvUfj8bjT7XdTVcaPenvcs51hXS8MXWdguzpD4Rh0XWs42rIcKd4C2rI6o0FHjYJ4q/f67riub2GtQXfoaAxb2rHbHnU35UMoUqx+4XP7nXatLH6R03cc56Rf1wuD03bd3pHGUDjc4XA4cHMUGjOKkeKlj8ZO58SqR9EiqGd12gONvn0AVs9xarC0xOKMHXto5ywCAaw4RfGyuL1Bf6QGiW39nYHr1h6gKJW/59rDtsawZT0ZnoyOcxLCQOSrVSFF+Trd3nFPjSJF0Lgrn2CNhWx/adx3rG6NhZRYxo7rZIWt7kS5yW7s22T9yd3uO3PfNtNUNcuNppqzvfdoVrxYY8bNbq19l1+/ADfC1+8UwqZ4qAmHjTBQxwKb4aEWHu6C9+t+vyne14T7jTC+jsVvhve18P4ueFr306Z4qgmnjTBUx0Kb4akWnu6CF3W/aIoXmnDRCCN0LKIZXmjhxS54UveTpniiCSeNMETHQprhiRae7IBn6EG2fFmnIN+uhNUiZU8uu9lchzgBNT0/T+QywQmvyZ6YIwEy3QslU/5P3QPiwiGHqj4NOCRxJDaQGCcTH0gtXfc0NDuCAGxmLTzBZhBtupzKB5hmi++hbHzX7nUhhkgeZRh6K3uk97L/BrIl/VHeMvPDQN6y/Ds5yEa7jGD5aJSjPXmsstVDVH1wdXRotw9t++f2/uvO5oT1zPjO+N74wbCNrvHaeGOcGhcGNLDxm/G78Ufrz9bfrX9a/66tT59s1nxrVK7Wf/8Db7dJIA==</latexit>
state s_0, and ends in the terminal state s_T, from which the agent stops
• Rewards rt interacting with the environment.
<latexit sha1_base64="JCho6RHVaZ12zLSSoSGaNUMc/P8=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq65B8+Dx//9JcokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9hKF7wKa3/PbZvnVoFZepFvam2Dc21+nt89b+ZEpgEqKIQwzi+INtUX6TAsYDiFG2N0liRAG8Az76kPBZ7yYNIppwFMHMfCG0WYJNTswcypwGDEGOV6IAkAUiwYRzwADkAn2vHhWjCIQoPpguAhqvy3jhrwsOxH3fpMuiL1ltYuozQOcBXNbIUhDGIeBzZTBehV59ECUYsUVYH8wpBaPkXCIGgzjvwalozDuatzo+J6cbfb6icxTFWZownFUnCgExhmZiYlHGiCc0LW5GPN+7+CVnCTrIy2Ls5RCwuzM0PRA5tYE6zgwTwOtDXpg3J0L3kIQhiKbphGbphKMlTycHh5kQX5geFmaPiLej7jzL0nSS98zzzDNhrYlvK+JbWRxVxNHmRwiemjPCzIV4/oTFpjCawsICiOL67Ity9sy8kKMvK+KlLF5VxCtZ9JKKmijqoqIuFPW+ot7L6rIiLmVxVRFXsvixIn6Uxetdse93xf4qxYr+i0Wx2ptM0Ux8PopXKL2DWfr6/M3PWdovrs27kCDTrhuh92A8Hne6/W4my/hBb497tjNU9dLQdQa2qzOUjkHXtYajLcuR5C2hLaszGnTkKIi3eq/vjlV9C2sNukNHY9jSjt32qLtpH0KRZPVLn9vvtJW2+GVO33Gck76qlwan7bq9I42hdLjD4XDgFig0YRQjyUsfjJ3OiaVG0TKoZ3XaA42+fQBWz3EUWFphccaOPbQLFo4Alpy8fFnc3qA/koP4tv/OwHWVB8gr7e+59rCtMWxZT4Yno+OChDAQ+XJXSNm+Trd33JOjSBk07oonqLCQ7S+N+47VVVhIhWXsuE7e2PpKFIvsg32Trj+523Vn7ttmlslmsdBkc772HsySF2vMuNmtte/y6yfgRnj1TiFsioeacNgIA3UssBkeauHhLnhf9ftN8b4m3G+E8XUsfjO8r4X3d8FT1U+b4qkmnDbCUB0LbYanWni6C56rft4UzzXhvBGG61h4MzzXwvNd8ET1k6Z4ogknjTBEx0Ka4YkWnuyAXx8I8p2CeLtSpkSKPbnYzRY6xClQ9JgDjgqZ4DRWZI/PEQe57oWCqfhH9YCkdIhS1qdBDEkS8Q0kxunEB0LL1nsamh9BADbzLTzBZhBtdjm1DzDNJ99BsfFdu9eNGCJxlGHojdgjvRP7byC2pD+JW2Z+GIhbFn8nB3m1ywiWD0ZR7YljlS0fotTi6ujQbh/a9i/t/VedzQnrifG98YPxo2EbXeOV8do4NS4MaITGb8bvxh+tP1t/t/5p/bu2Pn60mfOdUbta//0PDH9Jjg==</latexit>
Initial state s0
<latexit sha1_base64="GEr+u/sGsjmjWJOBOSadRWZ5rCM=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEurPunu9bh1Z+mfWBvRnsG5vr9O5Fa38yJTAOUSQgBpzf2BYVtwlgIoAYpXuTmCMK4D3w0U0sZr3bJIhoLFAEU/Ol1GYxNgUxMyZzGjAEBV7JAYAskAkmnAMGoJDke9UojiIQIn4wXQSUr4d84a8HAsjbvk2WeVnSysLEZ4DOA7iskCUg5CEQ89okX4VedRLFGLFFWJ3MKCWj4lwiBgOe1eBUFuY9zSrNz8npRp+v6BxFPE1ihtPyQikgxtBMLsyHHImYJvnNyMd7z18JFqODbJjPvRoCdn+GpgcypzJRxZlhAkR1yguz4kToAZIwBNE0mdA0mQi0FMnk4DCV4kvTw9LsEcCmVedZmiSTrGaeZ55Ja0V8VxLfqeKoJI42P0Lw1JwRZi7k8yeMm9JoSgsLIOLV1RfF6pl5oUZflsRLVbwqiVeq6MUlNa6pi5K6qKkPJfVBVZclcamKq5K4UsUPJfGDKl7viv1lV+yvSqysv9wUq73JFM3k1yN/hZJ7mCZvzt/+lCb9/Nq8CzEy7aoReo/G43Gn2++mqowf9fa4ZzvDul4Yus7AdnWGwjHoutZwtGU5UrwFtGV1RoOOGgXxVu/13XFd38Jag+7Q0Ri2tGO3PepuyodQpFj9wuf2O+1aWfwip+84zkm/rhcGp+26vSONoXC4w+Fw4OYoNGYUI8VLH42dzolVj6JFUM/qtAcaffsArJ7j1GBpicUZO/bQzlkEAlhxiuJlcXuD/kgNEtv6OwPXrT1AUSp/z7WHbY1hy3oyPBkd5ySEgchXq0KK8nW6veOeGkWKoHFXPsEaC9n+0rjvWN0aCymxjB3XyQpb3Ylyk93Yt8n6k7vdd+a+baapapYbTTVne+/RrHixxoyb3Vr7Lr9+AW6Er98phE3xUBMOG2GgjgU2w0MtPNwF79f9flO8rwn3G2F8HYvfDO9r4f1d8LTup03xVBNOG2GojoU2w1MtPN0FL+p+0RQvNOGiEUboWEQzvNDCi13wpO4nTfFEE04aYYiOhTTDEy082QHP0INs+bJOQb5dCatFyp5cdrO5DnECanp+nshlghNekz0xRwJkuhdKpvyfugfEhUMOVX0acEjiSGwgMU4mPpBauu5paHYEAdjMWniCzSDadDmVDzDNFt9D2fiu3etCDJE8yjD0VvZI72X/DWRL+qO8ZeaHgbxl+XdykI12GcHy0ShHe/JYZauHqPrg6ujQbh/a9s/t/dedzQnrmfGd8b3xg2EbXeO18cY4NS4MaGDjN+N344/Wn62/W/+0/l1bnz7ZrPnWqFyt//4Hwl9I3A==</latexit>
Terminal state sT
<latexit sha1_base64="mT6OGDXD2gX+Wm6tE1qKZ8PXC3g=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuju/e75vHVr5ZdYH9mawb2yu07sXrf3JlMA4RJGAGHB+Y1tU3CaAiQBilO5NYo4ogPfARzexmPVukyCisUARTM2XUpvF2BTEzJjMacAQFHglBwCyQCaYcA4YgEKS71WjOIpAiPjBdBFQvh7yhb8eCCBv+zZZ5mVJKwsTnwE6D+CyQpaAkIdAzGuTfBV61UkUY8QWYXUyo5SMinOJGAx4VoNTWZj3NKs0PyenG32+onMU8TSJGU7LC6WAGEMzuTAfciRimuQ3Ix/vPX8lWIwOsmE+92oI2P0Zmh7InMpEFWeGCRDVKS/MihOhB0jCEETTZELTZCLQUiSTg8NUii9ND0uzRwCbVp1naZJMspp5nnkmrRXxXUl8p4qjkjja/AjBU3NGmLmQz58wbkqjKS0sgIhXV18Uq2fmhRp9WRIvVfGqJF6poheX1LimLkrqoqY+lNQHVV2WxKUqrkriShU/lMQPqni9K/aXXbG/KrGy/nJTrPYmUzSTX4/8FUruYZq8OX/7U5r082vzLsTItKtG6D0aj8edbr+bqjJ+1Nvjnu0M63ph6DoD29UZCseg61rD0ZblSPEW0JbVGQ06ahTEW73Xd8d1fQtrDbpDR2PY0o7d9qi7KR9CkWL1C5/b77RrZfGLnL7jOCf9ul4YnLbr9o40hsLhDofDgZuj0JhRjBQvfTR2OidWPYoWQT2r0x5o9O0DsHqOU4OlJRZn7NhDO2cRCGDFKYqXxe0N+iM1SGzr7wxct/YARan8PdcetjWGLevJ8GR0nJMQBiJfrQopytfp9o57ahQpgsZd+QRrLGT7S+O+Y3VrLKTEMnZcJytsdSfKTXZj3ybrT+5235n7tpmmqlluNNWc7b1Hs+LFGjNudmvtu/z6BbgRvn6nEDbFQ004bISBOhbYDA+18HAXvF/3+03xvibcb4TxdSx+M7yvhfd3wdO6nzbFU004bYShOhbaDE+18HQXvKj7RVO80ISLRhihYxHN8EILL3bBk7qfNMUTTThphCE6FtIMT7TwZAc8Qw+y5cs6Bfl2JawWKXty2c3mOsQJqOn5eSKXCU54TfbEHAmQ6V4omfJ/6h4QFw45VPVpwCGJI7GBxDiZ+EBq6bqnodkRBGAza+EJNoNo0+VUPsA0W3wPZeO7dq8LMUTyKMPQW9kjvZf9N5At6Y/ylpkfBvKW5d/JQTbaZQTLR6Mc7cljla0eouqDq6NDu31o2z+39193NiesZ8Z3xvfGD4ZtdI3Xxhvj1LgwoIGN34zfjT9af7b+bv3T+ndtffpks+Zbo3K1/vsftLlJAA==</latexit>
8
To show this setup, let’s again look at the atari example. We see three states,
ATARI TRAJECTORY represented using images. These go into our CNN policy, and we sample an
action from them, in this case left (we see the bat go to the left). The ball hits
rt+1 = 0 rt+2 = 1 a block on the third picture, which increases the score by one! That is a
<latexit sha1_base64="6BZVX3qWN3a5GYVMCDT6XJYIKOY=">AAAOS3icfZdNb9s2HMbVbl27bG7T7biLsKDAsAWBlDh+OQSoJdvoYW2zIG9bbAQUTcuCKZEgKceuoE+y6/Zt9gX2NXYcdhglO7JESdYlDJ+Hj3/8UxRIh2KPC8P4+8nTzz5/9sXzF1/uffV14+Wr/dffXHMSMoiuIMGE3TqAI+wF6Ep4AqNbyhDwHYxunLmd6DcLxLhHgkuxomjsAzfwph4EQnbd778aMfQA2OQ+Ej+Z8Zlxv39gHBnpo5cb5qZxoG2e8/vXjYPRhMDQR4GAGHB+ZxpUjCPAhAcxivdGIUcUwDlw0V0opp1x5AU0FCiAsf5GatMQ64LoCZw+8RiCAq9kA0DmyQQdzgADUMgp7BWjOAqAj/jhZOFRvm7yhbtuCCDnP46WaX3iwsDIZYDOPLgskEXA5z4Qs1InX/lOsROFGLGFX+xMKCWj4lwiBj2e1OBcFuYjTUrOL8n5Rp+t6AwFPI5ChuP8QCkgxtBUDkybHImQRulk5DrP+ZlgITpMmmnfWR+w+QWaHMqcQkcRZ4oJEMUux0+KE6AHSHwfBJNoRONoJNBSRKPDo1iKb3QHS7ND5DtSdF7EUTRKauY4+oW0FsQPOfGDKg5y4mDzIwRP9Clh+kKuP2Fcl0ZdWpgHES+OvspGT/UrNfo6J16r4k1OvFFFJ8ypYUld5NRFSX3IqQ+qusyJS1Vc5cSVKn7KiZ9U8XZX7K+7Yn9TYmX95aZY7Y0maCo/I+krFM1hHL27fP9zHHXTZ/MuhEg3i0boPBpPhq12tx2rMn7Um8OOafXLemZoWz3TrjJkjl7bNvqDLcux4s2gDaM16LXUKIi3eqdrD8v6FtbotftWhWFLO7Sbg/amfAgFitXNfHa31SyVxc1yupZlnXbLemawmrbdOa4wZA673+/37BSFhoxipHjpo7HVOjXKUTQL6hitZq9C3y6A0bGsEizNsVhDy+ybKYtAACtOkb0sdqfXHahBYlt/q2fbpQUUufJ3bLPfrDBsWU/7p4OTlIQwELhqVUhWvla7c9JRo0gWNGzLFSyxkO0vDbuW0S6xkBzL0LKtpLDFnSg32Z05jtaf3O2+0w9MPY5Vs9xoqjnZe49mxYsrzLjeXWnf5a8egGvhyzOFsC4eVoTDWhhYxQLr4WElPNwF75b9bl28WxHu1sK4VSxuPbxbCe/ugqdlP62LpxXhtBaGVrHQenhaCU93wYuyX9TFi4pwUQsjqlhEPbyohBe74EnZT+riSUU4qYUhVSykHp5UwpMd8OtrQXJSkG9XxEqR8kwuT7OpDnEESjoXQKBUJjjiJdkRMyRAoju+ZEr/KXtAmDlkU9UnHockDMQGEuNo5AKpxeszDU2uIADryRGeYN0LNqecwgeYJoPnUB581+51IfpIXmUYei/PSB/l+RvII+mPcsrM9T05Zfl3dJi0dhnB8tEoW3vyWmWql6hy4+b4yGwemeYvzYO3rc0N64X2nfa99oNmam3trfZOO9euNKiF2u/aH9qfjb8a/zT+bfy3tj59shnzrVZ4Xj77H4vTSr0=</latexit> <latexit sha1_base64="0Xz+aHDaXnZyuOfTbAOjKslbg60=">AAAOS3icfZdNb9s2HMbVbl27bG7T7biLsKDAsAWBlDh+OQSoJdvoYW2zIG9bbAQUTcuCKZGgKMeuoE+y6/Zt9gX2NXYcdhglO7JEStYlDJ+Hj3/8UxRIh2Iv5Ibx95Onn33+7IvnL77c++rrxstX+6+/uQ5JxCC6ggQTduuAEGEvQFfc4xjdUoaA72B048ztVL9ZIBZ6JLjkK4rGPnADb+pBwEXX/f6rEUMPgE3uY/7TcXJm3u8fGEdG9uhqw9w0DrTNc37/unEwmhAY+SjgEIMwvDMNyscxYNyDGCV7oyhEFMA5cNFdxKedcewFNOIogIn+RmjTCOuc6CmcPvEYghyvRANA5okEHc4AA5CLKeyVo0IUAB+Fh5OFR8N1M1y46wYHYv7jeJnVJykNjF0G6MyDyxJZDPzQB3ymdIYr3yl3oggjtvDLnSmlYJScS8SgF6Y1OBeF+UjTkoeX5Hyjz1Z0hoIwiSOGk+JAISDG0FQMzJoh4hGNs8mIdZ6HZ5xF6DBtZn1nfcDmF2hyKHJKHWWcKSaAl7scPy1OgB4g8X0QTOIRTeIRR0sejw6PEiG+0R0szA4R70jZeZHE8SitmePoF8JaEj8UxA+yOCiIg82PEDzRp4TpC7H+hIW6MOrCwjyIwvLoq3z0VL+So68L4rUs3hTEG1l0ooIaKeqioC4U9aGgPsjqsiAuZXFVEFey+KkgfpLF212xv+6K/U2KFfUXm2K1N5qgqfiMZK9QPIdJ/O7y/c9J3M2ezbsQId0sG6HzaDwZttrddiLL+FFvDjum1Vf13NC2eqZdZcgdvbZt9AdblmPJm0MbRmvQa8lREG/1TtceqvoW1ui1+1aFYUs7tJuD9qZ8CAWS1c19drfVVMri5jldy7JOu6qeG6ymbXeOKwy5w+73+z07Q6ERoxhJXvpobLVODTWK5kEdo9XsVejbBTA6lqXA0gKLNbTMvpmxcASw5OT5y2J3et2BHMS39bd6tq0sIC+Uv2Ob/WaFYct62j8dnGQkhIHAlatC8vK12p2TjhxF8qBhW6ygwkK2vzTsWkZbYSEFlqFlW2lhyztRbLI7cxyvP7nbfacfmHqSyGax0WRzuvcezZIXV5hxvbvSvstfPQDXwqszhbAuHlaEw1oYWMUC6+FhJTzcBe+qfrcu3q0Id2th3CoWtx7erYR3d8FT1U/r4mlFOK2FoVUstB6eVsLTXfBc9fO6eF4RzmtheBULr4fnlfB8FzxR/aQunlSEk1oYUsVC6uFJJTzZAb++FqQnBfF2xUyJFGdycZrNdIhjoOghBxxlMsFxqMgOnyEOUt3xBVP2j+oBUe4QTVmfeCEkUcA3kBjHIxcILVmfaWh6BQFYT4/wBOtesDnllD7ANB08h+Lgu3avC9FH4irD0HtxRvoozt9AHEl/FFNmru+JKYu/o8O0tcsIlo9G0doT1ypTvkSpjZvjI7N5ZJq/NA/etjY3rBfad9r32g+aqbW1t9o77Vy70qAWab9rf2h/Nv5q/NP4t/Hf2vr0yWbMt1rpefnsf6eESr8=</latexit>
positive reward.
<latexit sha1_base64="gfi6SL5j681Wde1MvJUNdF/DmQ4=">AAAOW3icfZdNb9s2HMbV7q3LmizdsNMOExYU6IYgkBLHL4cCtWQbPaxtFuRti4KAomlZMCUSFOXY1XTcp9l1+zAD9mFGyY4sUZR1yT98Hj7+6S9RIF2K/Ygbxr9Pnn7y6Weff/Hsy52vnu/ufb3/4puriMQMoktIMGE3LogQ9kN0yX2O0Q1lCAQuRtfuzM706zlikU/CC76k6C4AXuhPfAi4GLrf/8Gh/r3j8ini4JUDYD7K/3AiDji65z/d7x8YR0Z+6fXCXBcH2vo6u3+xe+CMCYwDFHKIQRTdmgbldwlg3IcYpTtOHCEK4Ax46Dbmk+5d4oc05iiEqf5SaJMY65zoGaw+9hmCHC9FASDzRYIOp4AJTHFLO9WoCIUgQNHheO7TaFVGc29VcCD6cZcs8n6llYmJxwCd+nBRIUtAEAWAT2uD0TJwq4MoxojNg+pgRikYJecCMehHWQ/ORGM+0KzZ0QU5W+vTJZ2iMEqTmOG0PFEIiDE0ERPzMkI8pkl+M+K5z6LXnMXoMCvzsdcDwGbnaHwocioDVZwJJoBXh9wga06IHiAJAhCOE4emicPRgifO4VEqxJe6i4XZJYCNq87zNEmcrGeuq58La0V8XxLfy+KwJA7XP0LwWJ8Qps/F8ycs0oVRFxbmQxRVZ18Wsyf6pRx9VRKvZPG6JF7LohuX1LimzkvqvKY+lNQHWV2UxIUsLkviUhY/lsSPsnizLfa3bbG/S7Gi/2JRLHecMZqIz0r+CiUzmCZvL979kia9/Fq/CzHSzaoRuo/Gk1G70+uksowf9daoa1qDul4YOlbftFWGwtHv2MZguGE5lrwFtGG0h/22HAXxRu/27FFd38Aa/c7AUhg2tCO7Neys24dQKFm9wmf32q1aW7wip2dZ1mmvrhcGq2Xb3WOFoXDYg8Ggb+coNGYUI8lLH43t9qlRj6JFUNdot/oKffMAjK5l1WBpicUaWebAzFk4Alhy8uJlsbv93lAO4pv+W33brj1AXmp/1zYHLYVhw3o6OB2e5CSEgdCTu0KK9rU73ZOuHEWKoFFHPMEaC9n80qhnGZ0aCymxjCzbyhpbXYlikd2ad8nqk7tZd/qBqaepbBYLTTZna+/RLHmxwoyb3Ur7Nr96Am6Er98phE3xUBEOG2GgigU2w0MlPNwG79X9XlO8pwj3GmE8FYvXDO8p4b1t8LTup03xVBFOG2GoioU2w1MlPN0Gz+t+3hTPFeG8EYarWHgzPFfC823wpO4nTfFEEU4aYYiKhTTDEyU82QLP0IPY8mU7BfF2JawWuTo65DrECajp+YEilwlOopq8OoFkuhsIpvyfugfEhUOUsj72I0jikK8hMU4cDwgtXe1paHYEAVjPtvAE63643uVUPsA0mzyDYuO7cq8aMUDiKMPQO7FH+iD230BsSX8Wt8y8wBe3LP46h1m1zQgWj0ZR7YhjlSkfourF9fGR2ToyzV9bB2/a6xPWM+177UftlWZqHe2N9lY70y41qP2p/aX9rf2z+9/e072dvecr69Mn6znfapVr77v/ARAATx0=</latexit> <latexit sha1_base64="LQ65ew/VbtFCIOFTc/0FK50zRhc=">AAAOY3icfZfdbts2GIbV7q/LliztdjYMEBYU67YgkBLHPwcFask2erC2WZA02aIgoGhaFkyJBEU5djVdwq5mp9uF7HwXMkp2ZImUrBN/5vvy9aNPokG6FPsRN4x/Hz3+6ONPPv3syec7X3y5u/fV/tNn7yMSM4guIcGEXbsgQtgP0SX3OUbXlCEQuBhduTM706/miEU+CS/4kqLbAHihP/Eh4GLobv8Hh/p3jsuniIMXDoD5aMJ/NtM/nIgDjlZffrzbPzCOjPzS1cJcFwfa+jq7e7p74IwJjAMUcohBFN2YBuW3CWDchxilO04cIQrgDHjoJuaT7m3ihzTmKISp/lxokxjrnOgZtD72GYIcL0UBIPNFgg6ngAlccWs71agIhSBA0eF47tNoVUZzb1VwIPpymyzyvqWViYnHAJ36cFEhS0AQBYBPlcFoGbjVQRRjxOZBdTCjFIySc4EY9KOsB2eiMe9o1vTogpyt9emSTlEYpUnMcFqeKATEGJqIiXkZIR7TJL8Z8fxn0UvOYnSYlfnYywFgs3M0PhQ5lYEqzgQTwKtDbpA1J0T3kAQBCMeJQ9PE4WjBE+fwKBXic93FwuwSwMZV53maJE7WM9fVz4W1Ir4tiW9lcVgSh+sfIXisTwjT5+L5ExbpwqgLC/MhiqqzL4vZE/1Sjn5fEt/L4lVJvJJFNy6psaLOS+pcUe9L6r2sLkriQhaXJXEpix9K4gdZvN4W+9u22N+lWNF/sSiWO84YTcTfS/4KJTOYJq8v3vySJr38Wr8LMdLNqhG6D8aTUbvT66SyjB/01qhrWgNVLwwdq2/adYbC0e/YxmC4YTmWvAW0YbSH/bYcBfFG7/bskapvYI1+Z2DVGDa0I7s17Kzbh1AoWb3CZ/faLaUtXpHTsyzrtKfqhcFq2Xb3uMZQOOzBYNC3cxQaM4qR5KUPxnb71FCjaBHUNdqtfo2+eQBG17IUWFpisUaWOTBzFo4Alpy8eFnsbr83lIP4pv9W37aVB8hL7e/a5qBVY9iwng5Ohyc5CWEg9OSukKJ97U73pCtHkSJo1BFPUGEhm18a9Syjo7CQEsvIsq2ssdWVKBbZjXmbrP5yN+tOPzD1NJXNYqHJ5mztPZglL64x42Z3rX2bv34CboRX7xTCpnhYEw4bYWAdC2yGh7XwcBu8p/q9pnivJtxrhPHqWLxmeK8W3tsGT1U/bYqnNeG0EYbWsdBmeFoLT7fBc9XPm+J5TThvhOF1LLwZntfC823wRPWTpnhSE04aYUgdC2mGJ7XwZAs8Q/diy5ftFMTblTAlcnWEyHWIE6Do+aEilwlOIkVenUQy3Q0EU/5F9YC4cIhS1sd+BEkc8jUkxonjAaGlqz0NzY4gAOvZFp5g3Q/Xu5zKHzDNJs+g2Piu3KtGDJA4yjD0RuyR3on9NxBb0p/ELTMv8MUti0/nMKu2GcHiwSiqHXGsMuVDlFpcHR+ZrSPT/LV18Kq9PmE90b7VvtdeaKbW0V5pr7Uz7VKD2p/aX9rf2j+7/+3t7D3b+2ZlffxoPedrrXLtffc/yxxSFQ==</latexit>
⇡✓ (at |st ) ⇡✓ (at+1 |st+1 ) This brings into question the credit assignment problem: what was it that
caused this positive reward? It probably was not bat moving left, but rather,
at = Left
<latexit sha1_base64="+LzeGFZxpTjAKZV/HNGMwJ8Fw9M=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq6+G8+Dx//9JdokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9BMD885bfPtu3Dq3iMtXC3hT7xuY6vX3e2p9MCUxCFHGIQRx/sC3Kb1LAeAAxyvYmSYwogHfARx8SPuvdpEFEE44imJkvhDZLsMmJmUOZ04AhyPFKFACyQCSYcA6YgBPoe/WoGEUgRPHBdBHQeF3GC39dcCDu+yZdFn3JahNTnwE6D+CyRpaCMA4BnyuD8Sr06oMowYgtwvpgTikYJecSMRjEeQ9ORWPe0bzF8Tk53ejzFZ2jKM7ShOGsOlEIiDE0ExOLMkY8oWlxM+L53sUvOUvQQV4WYy+HgN2doemByKkN1HFmmABeH/LCvDkRuockDEE0TSc0SyccLXk6OTjMhPjC9LAwewSwad15lqXpJO+Z55lnwloT31bEt7I4qoijzY8QPDVnhJkL8fwJi01hNIWFBRDF9dkX5eyZeSFHX1bES1m8qohXsuglFTVR1EVFXSjqfUW9l9VlRVzK4qoirmTxY0X8KIvXu2Lf74r9VYoV/ReLYrU3maKZ+PsoXqH0Dmbp6/M3P2dpv7g270KCTLtuhN6D8Xjc6fa7mSzjB7097tnOUNVLQ9cZ2K7OUDoGXdcajrYsR5K3hLaszmjQkaMg3uq9vjtW9S2sNegOHY1hSzt226Pupn0IRZLVL31uv9NW2uKXOX3HcU76ql4anLbr9o40htLhDofDgVug0IRRjCQvfTB2OieWGkXLoJ7VaQ80+vYBWD3HUWBphcUZO/bQLlg4Alhy8vJlcXuD/kgO4tv+OwPXVR4gr7S/59rDtsawZT0ZnoyOCxLCQOTLXSFl+zrd3nFPjiJl0LgrnqDCQra/NO47VldhIRWWseM6eWPrK1Essg/2Tbr+y92uO3PfNrNMNouFJpvztfdglrxYY8bNbq19l18/ATfCq3cKYVM81ITDRhioY4HN8FALD3fB+6rfb4r3NeF+I4yvY/Gb4X0tvL8Lnqp+2hRPNeG0EYbqWGgzPNXC013wXPXzpniuCeeNMFzHwpvhuRae74Inqp80xRNNOGmEIToW0gxPtPBkBzxD92LLl+8UxNuVMiVyfWAodIhToOgxBxwVMsFprMgenyMOct0LBVPxRfWApHSIUtanQQxJEvENJMbpxAdCy9Z7GpofQQA28y08wWYQbXY5tT9gmk++g2Lju3avGzFE4ijD0BuxR3on9t9AbEl/ErfM/DAQtyw+Jwd5tcsIlg9GUe2JY5UtH6LU4uro0G4f2vYv7f1Xnc0J64nxvfGD8aNhG13jlfHaODUuDGiExm/G78YfrT9bf7f+af27tj5+tJnznVG7Wv/9D6soSYc=</latexit>
at+1 = Left
<latexit sha1_base64="5c7YI5DF4pFOIUZakFvff7LcCFY=">AAAOR3icfZfLbuM2GIWV6W2a1tOZdtmN0GCAog0CKXF8WQwwlmxjFp2ZNMitjY2AomlZMCWyFOXYI+g5um3fpq/Ql+iy6LKUbMsSKVkb/+Y5PP70SzRIh2Iv5Ibx98GTjz7+5NPPnn5++MWXjWdfPX/x9U1IIgbRNSSYsDsHhAh7AbrmHsfojjIEfAejW2dup/rtArHQI8EVX1E09oEbeFMPAi6GxiMA08+HmP9oJg/Pj4wTI7t0tTA3xZG2uS4eXjSORhMCIx8FHGIQhvemQfk4Box7EKPkcBSFiAI4By66j/i0M469gEYcBTDRXwptGmGdEz0F0yceQ5DjlSgAZJ5I0OEMMAEo8A/LUSEKgI/C48nCo+G6DBfuuuBA3Ps4Xma9SUoTY5cBOvPgskQWAz/0AZ8pg+HKd8qDKMKILfzyYEopGCXnEjHohWkPLkRj3tO0zeEVudjosxWdoSBM4ojhpDhRCIgxNBUTszJEPKJxdjPiGc/DV5xF6Dgts7FXfcDml2hyLHJKA2WcKSaAl4ccP21OgB4h8X0QTOIRTeIRR0sej45PEiG+1B0szA4BbFJ2XiZxPEp75jj6pbCWxHcF8Z0sDgriYPMjBE/0KWH6Qjx/wkJdGHVhYR5EYXn2dT57ql/L0TcF8UYWbwvirSw6UUGNFHVRUBeK+lhQH2V1WRCXsrgqiCtZ/FAQP8ji3b7YX/bF/irFiv6LRbE6HE3QVPyFZK9QPIdJ/Obq7U9J3M2uzbsQId0sG6GzNZ4NW+1uO5FlvNWbw45p9VU9N7StnmlXGXJHr20b/cGO5VTy5tCG0Rr0WnIUxDu907WHqr6DNXrtvlVh2NEO7eagvWkfQoFkdXOf3W01lba4eU7XsqzzrqrnBqtp253TCkPusPv9fs/OUGjEKEaSl26Nrda5oUbRPKhjtJq9Cn33AIyOZSmwtMBiDS2zb2YsHAEsOXn+stidXncgB/Fd/62ebSsPkBfa37HNfrPCsGM9758PzjISwkDgyl0hefta7c5ZR44iedCwLZ6gwkJ2vzTsWkZbYSEFlqFlW2ljyytRLLJ7cxyv/3J3604/MvUkkc1iocnmdO1tzZIXV5hxvbvSvs9fPQHXwqt3CmFdPKwIh7UwsIoF1sPDSni4D95V/W5dvFsR7tbCuFUsbj28Wwnv7oOnqp/WxdOKcFoLQ6tYaD08rYSn++C56ud18bwinNfC8CoWXg/PK+H5Pnii+kldPKkIJ7UwpIqF1MOTSniyB56hR7HlS3cK4u2KmRK5PjRkOsQxUPSQA44ymeA4VGSHzxAHqe74gin7onpAlDtEKesTL4QkCvgGEuN45AKhJes9DU2PIADr6RaeYN0LNruc0h8wTSfPodj4rt3rRvSROMow9Fbskd6L/TcQW9IfxC0z1/fELYvP0XFa7TOC5dYoqkNxrDLlQ5Ra3J6emM0T0/y5efS6tTlhPdW+1b7TvtdMra291t5oF9q1BrXftN+1P7Q/G381/mn82/hvbX1ysJnzjVa6nh38D+KuSgQ=</latexit>
several actions that were taken quite a few timesteps back, where the bat
reflected the ball. We need to assign credit to exactly the action(s) that
caused the positive reward because it allows us to ‘reinforce' these actions:
These were good choices!
st
<latexit sha1_base64="/pzW3Mv4fuxhZwPDsc7BaKD9MQQ=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuhN3z/etQyu/zPrA3gz2jc11eveitT+ZEhiHKBIQA85vbIuK2wQwEUCM0r1JzBEF8B746CYWs95tEkQ0FiiCqflSarMYm4KYGZM5DRiCAq/kAEAWyAQTzgEDUEjyvWoURxEIET+YLgLK10O+8NcDAeRt3ybLvCxpZWHiM0DnAVxWyBIQ8hCIeW2Sr0KvOolijNgirE5mlJJRcS4RgwHPanAqC/OeZpXm5+R0o89XdI4iniYxw2l5oRQQY2gmF+ZDjkRMk/xm5OO9568Ei9FBNsznXg0Buz9D0wOZU5mo4swwAaI65YVZcSL0AEkYgmiaTGiaTARaimRycJhK8aXpYWn2CGDTqvMsTZJJVjPPM8+ktSK+K4nvVHFUEkebHyF4as4IMxfy+RPGTWk0pYUFEPHq6oti9cy8UKMvS+KlKl6VxCtV9OKSGtfURUld1NSHkvqgqsuSuFTFVUlcqeKHkvhBFa93xf6yK/ZXJVbWX26K1d5kimby65G/Qsk9TJM3529/SpN+fm3ehRiZdtUIvUfj8bjT7XdTVcaPenvcs51hXS8MXWdguzpD4Rh0XWs42rIcKd4C2rI6o0FHjYJ4q/f67riub2GtQXfoaAxb2rHbHnU35UMoUqx+4XP7nXatLH6R03cc56Rf1wuD03bd3pHGUDjc4XA4cHMUGjOKkeKlj8ZO58SqR9EiqGd12gONvn0AVs9xarC0xOKMHXto5ywCAaw4RfGyuL1Bf6QGiW39nYHr1h6gKJW/59rDtsawZT0ZnoyOcxLCQOSrVSFF+Trd3nFPjSJF0Lgrn2CNhWx/adx3rG6NhZRYxo7rZIWt7kS5yW7s22T9yd3uO3PfNtNUNcuNppqzvfdoVrxYY8bNbq19l1+/ADfC1+8UwqZ4qAmHjTBQxwKb4aEWHu6C9+t+vyne14T7jTC+jsVvhve18P4ueFr306Z4qgmnjTBUx0Kb4akWnu6CF3W/aIoXmnDRCCN0LKIZXmjhxS54UveTpniiCSeNMETHQprhiRae7IBn6EG2fFmnIN+uhNUiZU8uu9lchzgBNT0/T+QywQmvyZ6YIwEy3QslU/5P3QPiwiGHqj4NOCRxJDaQGCcTH0gtXfc0NDuCAGxmLTzBZhBtupzKB5hmi++hbHzX7nUhhkgeZRh6K3uk97L/BrIl/VHeMvPDQN6y/Ds5yEa7jGD5aJSjPXmsstVDVH1wdXRotw9t++f2/uvO5oT1zPjO+N74wbCNrvHaeGOcGhcGNLDxm/G78Ufrz9bfrX9a/66tT59s1nxrVK7Wf/8Db7dJIA==</latexit>
st+1
<latexit sha1_base64="PlwOklHMGJBk/GATo+aTkqmNU+g=">AAAORnicfZfLbuM2GIU109s0bebSLrsRGgxQtEEgJY4viwHGkm3MojOTBrlMGwcBRdOyYEokSMqxR9BrdNu+TZ+hD9Fl0W0p2ZElirI2YXgOjz/9EoWfHsUBF5b196PHn3z62edfPPly76uv958+e/7imytOYgbRJSSYsA8e4AgHEboUgcDoA2UIhB5G197czfTrBWI8INGFWFF0GwI/CqYBBEJOjcdcAIHuEvGTnd49P7COrPwy6wN7MzgwNtfZ3Yv9g/GEwDhEkYAYcH5jW1TcJoCJAGKU7o1jjiiAc+Cjm1hMu7dJENFYoAim5kupTWNsCmJmXOYkYAgKvJIDAFkgE0w4AwxAIen3qlEcRSBE/HCyCChfD/nCXw8EkLd+myzz0qSVhYnPAJ0FcFkhS0DIQyBmtUm+Cr3qJIoxYouwOplRSkbFuUQMBjyrwZkszHuaVZtfkLONPlvRGYp4msQMp+WFUkCMoalcmA85EjFN8puRj3jOXwkWo8NsmM+9GgA2P0eTQ5lTmajiTDEBojrlhVlxInQPSRiCaJKMaZqMBVqKZHx4lErxpelhafYIYJOq8zxNknFWM88zz6W1Ir4rie9UcVgSh5sfIXhiTgkzF/L5E8ZNaTSlhQUQ8erqy2L11LxUo69K4pUqXpfEa1X04pIa19RFSV3U1PuSeq+qy5K4VMVVSVyp4seS+FEVP+yK/XVX7G9KrKy/3BSrvfEETeUXJH+FkjlMkzcXb39Ok15+bd6FGJl21Qi9B+PJqN3pdVJVxg96a9S1nUFdLwwdp2+7OkPh6HdcazDcshwr3gLastrDfluNgnird3vuqK5vYa1+Z+BoDFvakdsadjblQyhSrH7hc3vtVq0sfpHTcxzntFfXC4PTct3uscZQONzBYNB3cxQaM4qR4qUPxnb71KpH0SKoa7VbfY2+fQBW13FqsLTE4owce2DnLAIBrDhF8bK43X5vqAaJbf2dvuvWHqAolb/r2oOWxrBlPR2cDk9yEsJA5KtVIUX52p3uSVeNIkXQqCOfYI2FbH9p1HOsTo2FlFhGjutkha3uRLnJbuzbZP3J3e4788A201Q1y42mmrO992BWvFhjxs1urX2XX78AN8LX7xTCpnioCYeNMFDHApvhoRYe7oL3636/Kd7XhPuNML6OxW+G97Xw/i54WvfTpniqCaeNMFTHQpvhqRae7oIXdb9oiheacNEII3QsohleaOHFLnhS95OmeKIJJ40wRMdCmuGJFp7sgGfoXrZ8Wacg366E1SJlTy672VyHOAE1PT9T5DLBCa/JnpghATLdCyVT/k/dA+LCIYeqPgk4JHEkNpAYJ2MfSC1d9zQ0O4IAbGYtPMFmEG26nMoHmGaL51A2vmv3uhADJI8yDL2VPdJ72X8D2ZL+KG+Z+WEgb1n+HR9mo11GsHwwytGePFbZ6iGqPrg+PrJbR7b9S+vgdXtzwnpifGd8b/xg2EbHeG28Mc6MSwMa1Pjd+MP4c/+v/X/2/93/b219/Giz5lujcj01/gelwUmd</latexit>
st+2
<latexit sha1_base64="9Fg4jXwGbDbUAdJrMoo6ifvIsd8=">AAAORnicfZfLbuM2GIU109s0bebSLrsRGgxQtEEgJY4viwHGkm3MojOTBrlMGwcBRdOyYEokSMqxR9BrdNu+TZ+hD9Fl0W0p2ZElirI2YXgOjz/9EoWfHsUBF5b196PHn3z62edfPPly76uv958+e/7imytOYgbRJSSYsA8e4AgHEboUgcDoA2UIhB5G197czfTrBWI8INGFWFF0GwI/CqYBBEJOjcdcAIHuEvHTcXr3/MA6svLLrA/szeDA2Fxndy/2D8YTAuMQRQJiwPmNbVFxmwAmAohRujeOOaIAzoGPbmIx7d4mQURjgSKYmi+lNo2xKYiZcZmTgCEo8EoOAGSBTDDhDDAAhaTfq0ZxFIEQ8cPJIqB8PeQLfz0QQN76bbLMS5NWFiY+A3QWwGWFLAEhD4GY1Sb5KvSqkyjGiC3C6mRGKRkV5xIxGPCsBmeyMO9pVm1+Qc42+mxFZyjiaRIznJYXSgExhqZyYT7kSMQ0yW9GPuI5fyVYjA6zYT73agDY/BxNDmVOZaKKM8UEiOqUF2bFidA9JGEIokkypmkyFmgpkvHhUSrFl6aHpdkjgE2qzvM0ScZZzTzPPJfWiviuJL5TxWFJHG5+hOCJOSXMXMjnTxg3pdGUFhZAxKurL4vVU/NSjb4qiVeqeF0Sr1XRi0tqXFMXJXVRU+9L6r2qLkviUhVXJXGlih9L4kdV/LAr9tddsb8psbL+clOs9sYTNJVfkPwVSuYwTd5cvP05TXr5tXkXYmTaVSP0Howno3an10lVGT/orVHXdgZ1vTB0nL7t6gyFo99xrcFwy3KseAtoy2oP+201CuKt3u25o7q+hbX6nYGjMWxpR25r2NmUD6FIsfqFz+21W7Wy+EVOz3Gc015dLwxOy3W7xxpD4XAHg0HfzVFozChGipc+GNvtU6seRYugrtVu9TX69gFYXcepwdISizNy7IGdswgEsOIUxcvidvu9oRoktvV3+q5be4CiVP6uaw9aGsOW9XRwOjzJSQgDka9WhRTla3e6J101ihRBo458gjUWsv2lUc+xOjUWUmIZOa6TFba6E+Umu7Fvk/Und7vvzAPbTFPVLDeaas723oNZ8WKNGTe7tfZdfv0C3Ahfv1MIm+KhJhw2wkAdC2yGh1p4uAver/v9pnhfE+43wvg6Fr8Z3tfC+7vgad1Pm+KpJpw2wlAdC22Gp1p4ugte1P2iKV5owkUjjNCxiGZ4oYUXu+BJ3U+a4okmnDTCEB0LaYYnWniyA56he9nyZZ2CfLsSVouUPbnsZnMd4gTU9PxMkcsEJ7wme2KGBMh0L5RM+T91D4gLhxyq+iTgkMSR2EBinIx9ILV03dPQ7AgCsJm18ASbQbTpciofYJotnkPZ+K7d60IMkDzKMPRW9kjvZf8NZEv6o7xl5oeBvGX5d3yYjXYZwfLBKEd78lhlq4eo+uD6+MhuHdn2L62D1+3NCeuJ8Z3xvfGDYRsd47XxxjgzLg1oUON34w/jz/2/9v/Z/3f/v7X18aPNmm+NyvXU+B+zmUme</latexit>
https://becominghuman.ai/
Credit assignment: lets-build-an-atari-ai-part-0-
intro-to-rl-9b2c5336e0ec
What action causes reward?
9
<latexit sha1_base64="Ij/51rj3GbZAgyYSleGt0K1m3sU=">AAAOTHicfZfdbts2GIbVbuu6bO7S7XAnwoIC3RAEUuL456BALclGD9Y2C/LTLQ4CiqZlwZRIkJRjV9Od7HS7m13ArmOHw4BRsiNLlGQdJDTfl68ffRKNjy7FPheG8fejx598+tmTz59+sfflV61nX+8//+aKk4hBdAkJJuyDCzjCfoguhS8w+kAZAoGL0bU7t1P9eoEY90l4IVYU3QbAC/2pD4GQU3f7+/Tl2BUg+k3+nSEBfrjbPzCOjOzSqwNzMzjQNtfZ3fPWwXhCYBSgUEAMOL8xDSpuY8CEDzFK9sYRRxTAOfDQTSSmvdvYD2kkUAgT/YXUphHWBdFTOn3iMwQFXskBgMyXCTqcAQagkPewV47iKAQB4oeThU/5esgX3noggCzAbbzMCpSUFsYeA3Tmw2WJLAYBD4CYVSb5KnDLkyjCiC2C8mRKKRkV5xIx6PO0BmeyMO9pWnN+Qc42+mxFZyjkSRwxnBQXSgExhqZyYTbkSEQ0zm5GPug5fyVYhA7TYTb3ygFsfo4mhzKnNFHGmWICRHnKDdLihOgekiAA4SQe0yQeC7QU8fjwKJHiC93F0uwSwCZl53kSx+O0Zq6rn0trSXxXEN+p4rAgDjdfQvBEnxKmL+TzJ4zr0qhLC/Mh4uXVl/nqqX6pRl8VxCtVvC6I16roRgU1qqiLgrqoqPcF9V5VlwVxqYqrgrhSxY8F8aMqftgV+8uu2F+VWFl/uSlWe+MJmsrfkewViucwid9cvP0pifvZtXkXIqSbZSN0H4wno063301UGT/o7VHPtJyqnhu61sC06wy5Y9C1DWe4ZTlWvDm0YXSGg44aBfFW7/XtUVXfwhqDrmPVGLa0I7s97G7Kh1CoWL3cZ/c77UpZvDynb1nWab+q5warbdu94xpD7rAdxxnYGQqNGMVI8dIHY6dzalSjaB7UMzrtQY2+fQBGz7IqsLTAYo0s0zEzFoEAVpwif1ns3qA/VIPEtv7WwLYrD1AUyt+zTaddY9iynjqnw5OMhDAQempVSF6+Trd30lOjSB406sonWGEh228a9S2jW2EhBZaRZVtpYcs7UW6yG/M2Xv/kbvedfmDqSaKa5UZTzeneezArXlxjxs3uWvsuf/0C3AhfvVMIm+JhTThshIF1LLAZHtbCw13wXtXvNcV7NeFeI4xXx+I1w3u18N4ueFr106Z4WhNOG2FoHQtthqe18HQXvKj6RVO8qAkXjTCijkU0w4taeLELnlT9pCme1ISTRhhSx0Ka4UktPNkBz9C9bPnSTkG+XTGrRMqeXHazmQ5xDCo6F0CgTCY45hV5fdxIdTeQTNmHqgdEuUMOVX3ic0iiUGwgMY7HHpBasu5paHoEAVhPW3iCdT/cdDmlH2CaLp5D2fiu3etCOEgeZRh6K3uk97L/BrIl/VHeMvMCX96y/D8+TEe7jGD5YJSjPXmsMtVDVHVwfXxkto9M8+f2wevO5oT1VPtO+157qZlaV3utvdHOtEsNagvtd+0P7c/WX61/Wv+2/ltbHz/arPlWK13PnvwPn69Law==</latexit>
\pi_theta. This probability is given with the long equation. We will explain in a
Markov decision processes (MDPs) model p(⌧|✓) lot of detail in the coming slides what exactly this means.
<latexit sha1_base64="q+zu+qOySWKNgXCi+xVWreMKZs0=">AAAOvnicfZfbbts2HMaV7tRlq5duFxuwG2FBgWTzAilxfLgIVku20Yu1zYqctigzKJqWBVMiQVGOXUU3e8u9wR5jlCzLOloXNs3v4+ef/iIF0qTY9rii/Lv37JNPP/v8i+df7n/19YvGNwcvv73xiM8guoYEE3ZnAg9h20XX3OYY3VGGgGNidGvO9Ui/XSDm2cS94iuKHhxgufbUhoCLrvHBP/TIMDnwn8TnDHFwfCE6PA44GivHBmVkMg74hRL+HVz9qoYGtceJ8cgAMI7gT4mfH6dDA/6LGjZlg6FHwCbrn/KTnKqRthl+PD44VE6U+JLLDTVpHErJdTl++eLQmBDoO8jlEAPPu1cVyh8CwLgNMQr3Dd9DFMA5sNC9z6fdh8B2qc+RC0P5ldCmPpY5kaN6yBObIcjxSjQAZLZIkOEMMEEnqrafj/KQCxzkNScLm3rrprew1g0ORMkfgmX8SMLcwMBigM5suMyRBcDxHMBnpU5v5Zj5TuRjxBZOvjOiFIwF5xIxaHtRDS5FYd7TqMbeFblM9NmKzpDrhYHPcJgdKATEGJqKgXHTQ9ynQXwzYmrNvQvOfNSMmnHfxQCw+Qc0aYqcXEceZ4oJ4Pku04mK46JHSBwHuJPAoGFgcLTkgdE8CYX4SjaxMJtETJ2880MYBEZUM9OUPwhrTnyXEd8VxWFGHCZ/QvBEnhImL8TzJ8yThVEWFmZD5OVHX6ejp/J1MfomI94UxduMeFsUTT+j+iV1kVEXJfUxoz4W1WVGXBbFVUZcFcWPGfFjUbzbFfvnrti/CrGi/mJRrPaNCZqKN1c8hYI5DIM3V29/D4NefCVzwUeymjdCc2M8G7U7vU5YlPFGb426qjYo66mho/VVvcqQOvodXRkMtyynBW8KrSjtYb9djIJ4q3d7+qisb2GVfmegVRi2tCO9Newk5UPILVit1Kf32q1SWaw0p6dp2nmvrKcGraXr3dMKQ+rQB4NBX49RqM8oRgUv3Rjb7XOlHEXToK7SbvUr9O0DULqaVoKlGRZtpKkDNWbhCOCCk6eTRe/2e8NiEN/WX+vreukB8kz5u7o6aFUYtqzng/PhWUxCGHCtYlVIWr52p3vWLUaRNGjUEU+wxEK2/zTqaUqnxEIyLCNN16LC5leiWGT36kOwfuVu1518qMphWDSLhVY0R2tvYy54cYUZ17sr7bv81QNwLXz5TiGsi4cV4bAWBlaxwHp4WAkPd8FbZb9VF29VhFu1MFYVi1UPb1XCW7vgadlP6+JpRTithaFVLLQenlbC013wvOzndfG8IpzXwvAqFl4Pzyvh+S54UvaTunhSEU5qYUgVC6mHJ5XwZAf8+rQQ7RTE7ApYKXJ9Yoh1iANQ0uPTRSwTHHgleX1uiXTTEUzxj7IH+KlDNIv6xPYg8V2eQGIcGBYQWrje09DoCAKwHG3hCZZtN9nl5F7ANBo8h2Lju3avCzFA4ijD0FuxR3ov9t9AbEl/FrfMLMcWtyy+jWbU2mUEy41RtPbFsUotHqLKjdvTE7V1oqp/tA5ft5MT1nPpR+kn6UhSpY70WnojXUrXEpT+22vsfb/3Q+O3Bmo4DbK2PttLxnwn5a7G8n/biXMp</latexit>
T
Y -1 The probability of a trajectory is build up using three probability distributions:
p(⌧|✓) = p(s0 ) ⇡✓ (at |st )p(st+1 , rt+1 |st , at ) The first is the policy, we have seen this one before. It represents our agent,
t=0 and defines a distribution over actions given states.
<latexit sha1_base64="gfi6SL5j681Wde1MvJUNdF/DmQ4=">AAAOW3icfZdNb9s2HMbV7q3LmizdsNMOExYU6IYgkBLHL4cCtWQbPaxtFuRti4KAomlZMCUSFOXY1XTcp9l1+zAD9mFGyY4sUZR1yT98Hj7+6S9RIF2K/Ygbxr9Pnn7y6Weff/Hsy52vnu/ufb3/4puriMQMoktIMGE3LogQ9kN0yX2O0Q1lCAQuRtfuzM706zlikU/CC76k6C4AXuhPfAi4GLrf/8Gh/r3j8ini4JUDYD7K/3AiDji65z/d7x8YR0Z+6fXCXBcH2vo6u3+xe+CMCYwDFHKIQRTdmgbldwlg3IcYpTtOHCEK4Ax46Dbmk+5d4oc05iiEqf5SaJMY65zoGaw+9hmCHC9FASDzRYIOp4AJTHFLO9WoCIUgQNHheO7TaFVGc29VcCD6cZcs8n6llYmJxwCd+nBRIUtAEAWAT2uD0TJwq4MoxojNg+pgRikYJecCMehHWQ/ORGM+0KzZ0QU5W+vTJZ2iMEqTmOG0PFEIiDE0ERPzMkI8pkl+M+K5z6LXnMXoMCvzsdcDwGbnaHwocioDVZwJJoBXh9wga06IHiAJAhCOE4emicPRgifO4VEqxJe6i4XZJYCNq87zNEmcrGeuq58La0V8XxLfy+KwJA7XP0LwWJ8Qps/F8ycs0oVRFxbmQxRVZ18Wsyf6pRx9VRKvZPG6JF7LohuX1LimzkvqvKY+lNQHWV2UxIUsLkviUhY/lsSPsnizLfa3bbG/S7Gi/2JRLHecMZqIz0r+CiUzmCZvL979kia9/Fq/CzHSzaoRuo/Gk1G70+uksowf9daoa1qDul4YOlbftFWGwtHv2MZguGE5lrwFtGG0h/22HAXxRu/27FFd38Aa/c7AUhg2tCO7Neys24dQKFm9wmf32q1aW7wip2dZ1mmvrhcGq2Xb3WOFoXDYg8Ggb+coNGYUI8lLH43t9qlRj6JFUNdot/oKffMAjK5l1WBpicUaWebAzFk4Alhy8uJlsbv93lAO4pv+W33brj1AXmp/1zYHLYVhw3o6OB2e5CSEgdCTu0KK9rU73ZOuHEWKoFFHPMEaC9n80qhnGZ0aCymxjCzbyhpbXYlikd2ad8nqk7tZd/qBqaepbBYLTTZna+/RLHmxwoyb3Ur7Nr96Am6Er98phE3xUBEOG2GgigU2w0MlPNwG79X9XlO8pwj3GmE8FYvXDO8p4b1t8LTup03xVBFOG2GoioU2w1MlPN0Gz+t+3hTPFeG8EYarWHgzPFfC823wpO4nTfFEEU4aYYiKhTTDEyU82QLP0IPY8mU7BfF2JawWuTo65DrECajp+YEilwlOopq8OoFkuhsIpvyfugfEhUOUsj72I0jikK8hMU4cDwgtXe1paHYEAVjPtvAE63643uVUPsA0mzyDYuO7cq8aMUDiKMPQO7FH+iD230BsSX8Wt8y8wBe3LP46h1m1zQgWj0ZR7YhjlSkfourF9fGR2ToyzV9bB2/a6xPWM+177UftlWZqHe2N9lY70y41qP2p/aX9rf2z+9/e072dvecr69Mn6znfapVr77v/ARAATx0=</latexit>
~ r2 <latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovb
~ r3 F we gene a e he n a a e _0 om he n a a e d bu on Th
s1 <latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>
po y ne wo k p _ he a o ge a p obab y d bu on ove po b e
⇡✓ (a0 |s0 ) ⇡✓ (a1 |s1 ) ⇡✓ (a2 |s2 ) ⇡✓ (a2 s2 ) a on F om h d bu on we gene a e he a on a_0
~ a0 <latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>
~ a1
<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>
~ a2
Fo ow ng he a ow we pa he n a a e _0 and he a on a_0 o
he env onmen and gene a e he nex a e _1 and he a o a ed ewa d
_1 o h ae
Agen
Agen ✓
The p o e epea om h po n Gene a e an a on om he po y and
11
u e oge he w h he a e o gene a e he nex a e and ewa d
MDPs contain two important assumptions. These assumptions are used to
OBSERVABILITY develop efficient algorithms, but are often not a good model of the real
world! The first one is full observability of states. We assume that we receive
• Full observability of state from the environment all there is to know about the current state of the
environment. An example of an environment with full observability is chess:
<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>
r1 <latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>
r2 Both our agent and their opponent know everything there is to know about
~ ~ ~ ~
the current state of the game by just observing the game board. This is often
s1
<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>
s2
<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>
a wrong assumption: Consider the game of poker, where our agent only
s0
<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>
a0
<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>
a1
<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>
a2
<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>
knows their hand and the open cards, and not the hands of other players.
~ ~ Poker is an example of a partially observable MDP (POMDP), where our
~
agent can only observe a small part of the environment, or its observations
<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>
⇡✓
<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>
⇡✓ <latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>
⇡✓ are noisy. POMDPs are much more complex to work with than MDPs, but
• Partial observability: POMDP there is a lot of literature out there! For this lecture, it’s out of scope,
however.
• Out of scope!
12
The second key assumption, which is maybe even more important than the
MARKOV ASSUMPTION previous one, is the Markov assumption. It says that the distribution over the
next state s_t is independent of the history, if we have complete information
Markov assumption: st independent of history given st-1:
<latexit sha1_base64="hFxcxLt2af6z65PzziqJZ9qXfU0=">AAAN+XicfZdNb9s2HMbV7q3L6rXdjrsICwoMQxBIieOXQ4Fako0e1jYL8rbFQUDRtCyYEgmKcuwI+hK7bocdh133abZPM0p2ZImirEsYPg8f//SXKPzpUuxH3DD+ffL0k08/+/yLZ1/uffW89fWLl6++uYxIzCC6gAQTdu2CCGE/RBfc5xhdU4ZA4GJ05c7tTL9aIBb5JDznK4puA+CF/tSHgIup63HEAUd3/O7lvnFo5JdeH5ibwb62uU7vXj3/bzwhMA5QyCEGUXRjGpTfJoBxH2KU7o3jCFEA58BDNzGf9m4TP6QxRyFM9ddCm8ZY50TPmPSJzxDkeCUGADJfJOhwBhiAXJDvVaMiFIIARQeThU+j9TBaeOsBB+K2b5NlXpa0sjDxGKAzHy4rZAkIogDwWW0yWgVudRLFGLFFUJ3MKAWj5FwiBv0oq8GpKMxHmlU6OienG322ojMURmkSM5yWFwoBMYamYmE+jBCPaZLfjHi88+gNZzE6yIb53BsHsPkZmhyInMpEFWeKCeDVKTfIihOie0iCAISTZEzTZMzRkifjg8NUiK91FwuzSwCbVJ1naZKMs5q5rn4mrBXxQ0n8IIvDkjjc/AjBE31KmL4Qz5+wSBdGXViYD1FUXX1RrJ7qF3L0ZUm8lMWrkngli25cUuOauiipi5p6X1LvZXVZEpeyuCqJK1l8KIkPsni9K/aXXbG/SrGi/mJTrPbGEzQVX4/8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Nx6NOt99NZRk/6u1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76aOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd779EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqen6eyGWCk6gmu3yGOMh0NxBM+T/rVoRmJweA9azzJlj3w01zUvlu0mzlHIp+de1e8ztInEAYei9am4+ibQaik/xRkDIv8AWp+Ds+yEa7jGD5aBSjPXEaMuWzT31wdXRotg9N8+f2/tvO5mD0TPtO+177QTO1rvZWe6edahca1LD2m/a79kfrofVn66/W32vr0yebNd9qlav1z/9NHCok</latexit> <latexit sha1_base64="+i+xB44Y0hcRWpnFl3QGzjOKgDk=">AAAN/XicfZdNb9s2HMbV7q3L6rXdjrsICwoMQxZIieOXQ4Fako0e1jYL8rbFQUDRtCyYEgmKcuwK2sfYdTvsOOy6z7J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHH338yaefPfl874unrS+fPX/x1WVEYgbRBSSYsGsXRAj7IbrgPsfomjIEAhejK3duZ/rVArHIJ+E5X1F0GwAv9Kc+BFxMjccRBxzdJfwHM717vm8cGvml1wfmZrCvba7TuxdP/xtPCIwDFHKIQRTdmAbltwlg3IcYpXvjOEIUwDnw0E3Mp73bxA9pzFEIU/2l0KYx1jnRMy594jMEOV6JAYDMFwk6nAEGIBf0e9WoCIUgQNHBZOHTaD2MFt56wIG49dtkmZcmrSxMPAbozIfLClkCgigAfFabjFaBW51EMUZsEVQnM0rBKDmXiEE/ympwKgrznmbVjs7J6UafregMhVGaxAyn5YVCQIyhqViYDyPEY5rkNyMe8Tx6xVmMDrJhPvfKAWx+hiYHIqcyUcWZYgJ4dcoNsuKE6B6SIADhJBnTNBlztOTJ+OAwFeJL3cXC7BLAJlXnWZok46xmrqufCWtFfFcS38nisCQONz9C8ESfEqYvxPMnLNKFURcW5kMUVVdfFKun+oUcfVkSL2XxqiReyaIbl9S4pi5K6qKm3pfUe1ldlsSlLK5K4koWP5TED7J4vSv2512xv0ixov5iU6z2xhM0FV+Q/BVK5jBN3py//TFN+vm1eRdipJtVI3QfjMejTrffTWUZP+jtUc+0nLpeGLrWwLRVhsIx6NqGM9yyHEneAtowOsNBR46CeKv3+vaorm9hjUHXsRSGLe3Ibg+7m/IhFEpWr/DZ/U67VhavyOlblnXSr+uFwWrbdu9IYSgctuM4AztHoTGjGEle+mDsdE6MehQtgnpGpz1Q6NsHYPQsqwZLSyzWyDIdM2fhCGDJyYuXxe4N+kM5iG/rbw1su/YAean8Pdt02grDlvXEORke5ySEgdCTq0KK8nW6veOeHEWKoFFXPMEaC9n+0qhvGd0aCymxjCzbygpb3Ylik92Yt8n6k7vdd/q+qaepbBYbTTZne+/BLHmxwoyb3Ur7Lr96AW6Er98phE3xUBEOG2GgigU2w0MlPNwF79X9XlO8pwj3GmE8FYvXDO8p4b1d8LTup03xVBFOG2GoioU2w1MlPN0Fz+t+3hTPFeG8EYarWHgzPFfC813wpO4nTfFEEU4aYYiKhTTDEyU82QHP0L1o+bJOQbxdCatFip5cdLO5DnECanp+pshlgpOoJrt8hjjIdDcQTPk/61aEZicHgPWs8yZY98NNc1L5btJs5RyKfnXtXvM7SJxAGHorWpv3om0GopP8XpAyL/AFqfg7PshGu4xg+WAUoz1xGjLls099cHV0aLYPTfOn9v7rzuZg9ET7RvtW+04zta72WnujnWoXGtSo9pv2u/ZH69fWn62/Wn+vrY8fbdZ8rVWu1j//Az0sK6I=</latexit>
<latexit sha1_base64="09xIn/l5oG3MK3LZwLHHoihf5iw=">AAAOUXichZdNc6M2HMbZ9G2b3aTZ9tgL08zObDuuBxLHL4fMrAF79tDdTTN5a+M0I2QZMxZII4RjL+Xb9RP01n6EXttDjxXYwSDA5WKh59HjH38kRrIpdgOuaX882fno408+/ezp57vPnu/tf3Hw4surgIQMoktIMGE3NggQdn10yV2O0Q1lCHg2Rtf2zEz06zligUv8C76k6M4Dju9OXAi46Lo/+IW+GgUccHTPfx0BmHZG/Hs9bqjrfr2hNpvN7DYVvz3932Er3/3BodbU0kstN/R141BZX2f3L57/ORoTGHrI5xCDILjVNcrvIsC4CzGKd0dhgCiAM+Cg25BPuneR69OQIx/G6kuhTUKscqImz6qOXYYgx0vRAJC5IkGFU8AEr6jIbjEqQD7wUNAYz10arJrB3Fk1OBDlvIsWabnjwsDIYYBOXbgokEXACzzAp6XOYOnZxU4UYsTmXrEzoRSMknOBGHSDpAZnojDvaVL14IKcrfXpkk6RH8RRyHCcHygExBiaiIFpM0A8pFH6MGLazIJTzkLUSJpp36kF2OwcjRsip9BRxJlgAnixy/aS4vjoARLPA/44GtE4GnG04NGo0YyF+FK1sTDbBLBx0XkeR9EoqZltq+fCWhDf5cR3sjjIiYP1nxA8VieEqXPx/gkLVGFUhYW5EAXF0ZfZ6Il6KUdf5cQrWbzOideyaIc5NSyp85w6L6kPOfVBVhc5cSGLy5y4lMUPOfGDLN5si/1pW+zPUqyov1gUy93RGE3EVymdQtEMxtGbi7c/xFEvvdZzIUSqXjRC+9F4PGx3ep1YlvGj3hp2dcMq65mhY/R1s8qQOfodU7MGG5YjyZtBa1p70G/LURBv9G7PHJb1DazW71hGhWFDOzRbg866fAj5ktXJfGav3SqVxclyeoZhnPTKemYwWqbZPaowZA7Tsqy+maLQkFGMJC99NLbbJ1o5imZBXa3d6lfomxegdQ2jBEtzLMbQ0C09ZeEIYMnJs8lidvu9gRzEN/U3+qZZeoE8V/6uqVutCsOG9cQ6GRynJIQB35GrQrLytTvd464cRbKgYUe8wRIL2fzTsGdonRILybEMDdNICltciWKR3ep30eqTu1l36qGuxrFsFgtNNidr79EseXGFGde7K+3b/NUDcC18+UkhrIuHFeGwFgZWscB6eFgJD7fBO2W/UxfvVIQ7tTBOFYtTD+9Uwjvb4GnZT+viaUU4rYWhVSy0Hp5WwtNt8Lzs53XxvCKc18LwKhZeD88r4fk2eFL2k7p4UhFOamFIFQuphyeV8GQLPEMPYsuX7BTE7IpYKXJ1hkh1iCNQ0tNDRSoTHAUl2eZTxEGi255gSm9WWxGanBwAVpOdN8Gq6683J4XvJk1GzqDYr67cK34LiRMIQ2/F1ua92DYDsZP8TpAyx3MFqfgdNZLWNiNYPBpFa1echnT57FNuXB819VZT139sHb5urw9GT5WvlW+UV4qudJTXyhvlTLlUoPKb8pfyt/LP3u97/+4r+zsr686T9ZivlMK1/+w/Dy1I1A==</latexit>
of the current state s_{t-1}. Or, in other words, we can reliably reconstruct the
p(st |at-1 , s1 , ..., st-1 ) = p(st |at-1 , st-1 ) next state using just the current state and the chosen action. This condition is
<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>
r1 <latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>
r2 very easy to violate: For example, a static image doesn’t contain the velocity
• Used to derive strong algorithms! ~ ~ ~ ~
of objects on the pixel! This requires multiple images, or a different feature
• Fundamental assumption behind RL s1
<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>
s2
<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>
space. Often, we can expand the features of the state, for example by taking
s0
<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>
a0
<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>
a1
<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>
a2
<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>
the last few observations as part of the state, to solve such issues.
~ ~ ~
⇡✓ ⇡✓ ⇡✓
<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit> <latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>
<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>
13
14
The total reward is simply the sum of all rewards the agent receives during
EXPECTED RETURN