Class Meeting 08: Q-Learning in a 5 Room Building Class Exercise Solutions
This page contains solutions for the q-learning in a 5 room building class exercise in Class Meeting 08.
\(Q(s_t, a_t)\) after trajectory 0
|
North |
South |
East |
West |
rm0
| 0 |
0 |
0 |
0 |
rm1
| 0 |
0 |
0 |
0 |
rm2
| 0 |
0 |
0 |
0 |
rm3
| 0 |
0 |
0 |
0 |
rm4
| 0 |
100 |
0 |
0 |
rm5
| 100 |
0 |
0 |
0 |
\(Q(s_t, a_t)\) after trajectory 1
|
North |
South |
East |
West |
rm0
| 0 |
0 |
0 |
0 |
rm1
| 180 |
0 |
0 |
0 |
rm2
| 0 |
0 |
0 |
0 |
rm3
| 0 |
0 |
0 |
0 |
rm4
| 0 |
100 |
0 |
0 |
rm5
| 100 |
180 |
0 |
0 |
\(Q(s_t, a_t)\) after trajectory 2
|
North |
South |
East |
West |
rm0
| 0 |
80 |
0 |
0 |
rm1
| 180 |
0 |
0 |
0 |
rm2
| 0 |
0 |
0 |
0 |
rm3
| 0 |
0 |
0 |
80 |
rm4
| 0 |
244 |
0 |
0 |
rm5
| 100 |
180 |
0 |
0 |
\(Q(s_t, a_t)\) after trajectory 3
|
North |
South |
East |
West |
rm0
| 0 |
80 |
0 |
0 |
rm1
| 244 |
0 |
0 |
0 |
rm2
| 0 |
0 |
0 |
64 |
rm3
| 144 |
0 |
0 |
195.2 |
rm4
| 0 |
244 |
156.16 |
0 |
rm5
| 100 |
180 |
0 |
0 |
\(\pi(s)\) after trajectory 3
|
\(a\) |
rm0
| South |
rm1
| North |
rm2
| West |
rm3
| West |
rm4
| South |
rm5
| South |