Class Meeting 09: Q-Learning in a 5 Room Building Class Exercise Solutions


This page contains solutions for the q-learning in a 5 room building class exercise in Class Meeting 09.


\(Q(s_t, a_t)\) after trajectory 0


North South East West
rm0 -1 0 -1 -1
rm1 0 0 -1 -1
rm2 -1 -1 -1 0
rm3 0 -1 0 0
rm4 0 100 0 -1
rm5 100 0 0 0

\(Q(s_t, a_t)\) after trajectory 1


North South East West
rm0 -1 0 -1 -1
rm1 180 0 -1 -1
rm2 -1 -1 -1 0
rm3 0 -1 0 0
rm4 0 100 0 -1
rm5 100 180 0 0

\(Q(s_t, a_t)\) after trajectory 2


North South East West
rm0 -1 80 -1 -1
rm1 180 0 -1 -1
rm2 -1 -1 -1 0
rm3 0 -1 0 80
rm4 0 244 0 -1
rm5 100 180 0 0

\(Q(s_t, a_t)\) after trajectory 3


North South East West
rm0 -1 80 -1 -1
rm1 244 0 -1 -1
rm2 -1 -1 -1 64
rm3 144 -1 0 195.2
rm4 0 244 156.16 -1
rm5 100 180 0 0