Class Meeting 08: Q-Learning in a 5 Room Building Class Exercise Solutions


This page contains solutions for the q-learning in a 5 room building class exercise in Class Meeting 08.


\(Q(s_t, a_t)\) after trajectory 0


North South East West
rm0 0 0 0 0
rm1 0 0 0 0
rm2 0 0 0 0
rm3 0 0 0 0
rm4 0 100 0 0
rm5 100 0 0 0

\(Q(s_t, a_t)\) after trajectory 1


North South East West
rm0 0 0 0 0
rm1 180 0 0 0
rm2 0 0 0 0
rm3 0 0 0 0
rm4 0 100 0 0
rm5 100 180 0 0

\(Q(s_t, a_t)\) after trajectory 2


North South East West
rm0 0 80 0 0
rm1 180 0 0 0
rm2 0 0 0 0
rm3 0 0 0 80
rm4 0 244 0 0
rm5 100 180 0 0

\(Q(s_t, a_t)\) after trajectory 3


North South East West
rm0 0 80 0 0
rm1 244 0 0 0
rm2 0 0 0 64
rm3 144 0 0 195.2
rm4 0 244 156.16 0
rm5 100 180 0 0

\(\pi(s)\) after trajectory 3


\(a\)
rm0 South
rm1 North
rm2 West
rm3 West
rm4 South
rm5 South