Class Meeting 07: Path Finding


Today's Class Meeting


Value Iteration


\(\textrm{Algorithm MDP_discrete_value_iteration}():\)
\( \qquad \textrm{for} \: s \in S \: \textrm{do} \)
\( \qquad\qquad V(s) = min\big(R(s_t, a_t)\big) \)
\( \qquad \textrm{repeat until convergence} \)
\( \qquad \qquad \textrm{for} \: s \in \{s_0, s_1, s_2, ... \} \: \textrm{do} \)
\( \qquad\qquad\qquad V(s_t) = \textrm{max}_{a_t} \Big( R(s_t,a_t) + \gamma \sum_{s_{t+1} \in S} T(s_t, a_t, s_{t+1}) \: V(s_{t+1}) \Big)\)
\( \qquad \textrm{return} \: V \)

For this algorithm, we assume that we are working with a Markov decision process \( \langle S, A, T, R, \gamma \rangle \), where

Once we have completed the value iteration process, we can now specify an optimal policy for selection actions based on our converged value function (\(V^*\)) as we can see from the algorithm below.

\(\textrm{Algorithm execute_MDP_policy}(x, V^*): \)
\( \qquad \textrm{return} \: \: \underset{a_t}{\textrm{argmax}} \:\Big[ R(s_t, a_t) + \gamma \sum_{s_{t+1} \in S} T(s_t, a_t, s_{t+1}) \: V^*(s_{t+1}) \Big] \)

A* Pathfinding Algorithm


The A* pathfinding algorithm can be used to find the most efficient path from a start location to a goal location. The A* algorithm makes the following significant assumptions:

The A* algorithm computes the following terms for nodes, \(n\) in the graph:

The A* algorithm is implemented as follows:

A Grid World Example


In class lecture, we cover the following example grid world. We assume that the movement cost of moving one grid cell up, right, down, and left is 10 units. We assume that moving diagonally has a movement cost of 14 units.

grid world

The algorithm starts with the start node. We compute the \(g\), \(h\), and \(f\) values for each of the neighbors of the start node. In the diagram below, you can find the \(g\) value in the upper left corner of each grid cell, the \(h\) value in the upper right corner of each grid cell, and the \(f\) value in the center of each grid cell.

a star step 1

Our next current node will be the node with the \(f\) value of 42. We will next compute the \(g\), \(h\), and \(f\) values for each of its neighbors.

a star step 2

The upper left neighbor of the current node has the lowest \(f\) value (still 42), so we will select it to be our next current node, computing the \(g\), \(h\), and \(f\) values for each of its neighbors.

a star step 3

Finally, we select the upper left node as the next current node, and we are happy to find that it is the end node! We can stop the algorithm. Now we have a best path from start to end with a final movement cost, \(f\)-value, of 42.

Class Exercise: A* Algorithm for a Grid World with Obstacles


Please work in a group of 2-3 students for this exercise. We will execute the rest of the A* algorithm on the same grid world, but this time, we have an obstacle.

a star step 3
Note: You can find the solution to this exercise on this page.

Acknowledgments


The value iteration component of today's lecture was informed by Probabilistic Robotics by Sebastian Thrun, Wolfram Burgard, and Dieter Fox as well as Kaelbling et al. (1996) and Charles Isbell and Michael Littman's Udacity course on Reinforcement Learning. The A* algorithm content of today's class was informed by the A* wikipedia page as well as Sebastian Lague's YouTube video on A* Pathfinding. The lecture for today also features this YouTube video about the Shakey robot.