Class Meeting 09: Q-Learning in a 5 Room Building Class Exercise Solutions
This page contains solutions for the q-learning in a 5 room building class exercise in Class Meeting 09.
\(Q(s_t, a_t)\) after trajectory 0
|
North |
South |
East |
West |
rm0
| -1 |
0 |
-1 |
-1 |
rm1
| 0 |
0 |
-1 |
-1 |
rm2
| -1 |
-1 |
-1 |
0 |
rm3
| 0 |
-1 |
0 |
0 |
rm4
| 0 |
100 |
0 |
-1 |
rm5
| 100 |
0 |
0 |
0 |
\(Q(s_t, a_t)\) after trajectory 1
|
North |
South |
East |
West |
rm0
| -1 |
0 |
-1 |
-1 |
rm1
| 180 |
0 |
-1 |
-1 |
rm2
| -1 |
-1 |
-1 |
0 |
rm3
| 0 |
-1 |
0 |
0 |
rm4
| 0 |
100 |
0 |
-1 |
rm5
| 100 |
180 |
0 |
0 |
\(Q(s_t, a_t)\) after trajectory 2
|
North |
South |
East |
West |
rm0
| -1 |
80 |
-1 |
-1 |
rm1
| 180 |
0 |
-1 |
-1 |
rm2
| -1 |
-1 |
-1 |
0 |
rm3
| 0 |
-1 |
0 |
80 |
rm4
| 0 |
244 |
0 |
-1 |
rm5
| 100 |
180 |
0 |
0 |
\(Q(s_t, a_t)\) after trajectory 3
|
North |
South |
East |
West |
rm0
| -1 |
80 |
-1 |
-1 |
rm1
| 244 |
0 |
-1 |
-1 |
rm2
| -1 |
-1 |
-1 |
64 |
rm3
| 144 |
-1 |
0 |
195.2 |
rm4
| 0 |
244 |
156.16 |
-1 |
rm5
| 100 |
180 |
0 |
0 |