Your goal in this project is to implement a Q-Learning algorithm to give your robot the ability to learn how to organize items in its environment using reinforcement learning. This project will also involve components of both 1) perception - to detect the items to organize and the locations for drop-off and 2) control - to make the robot arm pick up the items and then navigate to the locations where they are to be dropped off. Like before, If you have any questions about this project or find yourself getting stuck, please post on the course Slack or send a Slack DM to the teaching team. Even if you don't find yourself hitting roadblocks, feel free to share with your peers what's working well for you.
You are expected to work with 1 other student for this project, who is different than your particle filter project partner. If you strongly prefer working by yourself, please reach out to the teaching team to discuss your individual case. A team of 3 will only be allowed if there is an odd number of students. Your team will submit your code and writeup together (in 1 Github repo).
If you are looking for a partner, send a Slack message in the channel #looking-for-teammates
.
For questions please post them in the #q-learning-project
channel.
Like before, you'll submit this project on Github (both the code and the writeup). Have one team member fork our starter git repo to get our starter code and so that we can track your project. Both of your team members will contribute code to this one forked repo.
Please put your implementation plan within your README.md
file. Your implementation plan should contain the following:
Modify the README.md
file as your writeup for this project. Please add pictures, Youtube videos, and/or embedded animated gifs to showcase and describe your work. Your writeup should contain:
The code that you develop for this project should be in new Python ROS nodes that you create within the scripts
folder and the two empty training.launch
action.launch
launch files within the launch
folder. Besides the /scripts/q_learning.py
file, DO NOT EDIT ANY OTHER SCRIPTS OR MESSAGE FILES WE PROVIDE IN THE STARTER GIT REPO (e.g., scripts/reset_world.py
, scripts/phantom_robot_movement.py
, any of the custom message files in the /msg
directory). If you wish to create additional launch files or additional custom ROS messages, you're welcome to do that. We just ask that you don't edit the launch files, ROS nodes, and custom messages that we've provided in the starter code.
training.launch
should have node(s) that:
virtual_reset_world.py
q_learning.py
action.launch
should have node(s) that:
Note that action.launch
should run on top of turtlebot3_intro_robo_manipulation.launch
In your writeup, include a gif of your robot successfully executing the task once your Q matrix has converged.
Record a run of your q-learning algorithm during training in a rosbag. Please record the following topics: /cmd_vel
, /gazebo/set_model_state
, /q_learning/q_matrix
, /q_learning/reward
, /q_learning/robot_action
, /scan
, and any other topics you generate and use in your particle filter project. Please do not record all of the topics, since the camera topics make the rosbags very large. For ease of use, here's how to record a rosbag:
$ rosbag record -O filename.bag topic-names
Please refer to the ROS Resources page for further details on how to record a rosbag.
The final deliverable is ensuring that each team member completes the Partner Contributions Google Survey. The purpose of this survey is to accurately capture the contributions of each partner to your combined q-learning project deliverables.
Will be published soon.
As was true with the prior projects, we will consider your latest commit before 11:00 AM CST as your submission for each deadline. Do not forget to push your changes to your forked github repos. You do not need to email us your repo link, since we will be able to track your repo via your fork. If you want to use any of your flex late hours for this assignment, please send a group DM on Slack to all teaching stuff (so we know to clone your code at the appropriate commit for grading).
Your goal in this project is to computationally determine what actions the robot should take in order to achieve the goal state (where each colored dumbbell is placed in front of the correct numbered block) using reinforcement learning. Conceptually, your program will be in either of two phases:
To launch the Gazebo world for this project, run:
$ roslaunch q_learning_project turtlebot3_intro_robo_manipulation.launch
You should see the world pictured below. Our Turtlebot3 is now equipped with an OpenMANIPULATOR arm.
One important feature to make you aware of is that this Gazebo world file is launched with the parameter paused
set to true
(turtlebot3_intro_robo_manipulation.launch lines 4 and 12). Whenever you want to start running, you'll need to press the play button the bottom left hand corner of your Gazebo window (circled in light blue in the picture above).
As we mentioned above, your goal in this project is to computationally determine what actions the robot should take in order to achieve the goal state (where each colored dumbbell is placed in front of the correct numbered block) using reinforcement learning, and specifically, Q-learning. You will implement your Q-learning algorithm in (a) new ROS node(s) that you'll compose in (a) new Python file(s) within the /scripts
directory. Once you have all of your appropriate nodes working, fill in the training.launch
file so that it launches all of your nodes and trains the q-matrix with one roslaunch
command. Your Q-learning algorithm should proceed as follows:
\(\textrm{Algorithm Q_Learning}:\)
\( \qquad \textrm{initialize} \: Q \)
\( \qquad t = 0 \)
\( \qquad \textrm{while} \: Q \: \textrm{has not converged:} \)
\( \qquad \qquad \textrm{select} \: a_t \: \textrm{at random} \)
\( \qquad \qquad \textrm{perform} \: a_t \)
\( \qquad \qquad \textrm{receive} \: r_t \)
\( \qquad \qquad Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \cdot \Big( r_t + \gamma \cdot \: \textrm{max}_a Q(s_{t+1}, a) - Q(s_t, a_t) \Big)\)
\( \qquad \qquad t = t + 1\)
Where:
You will need to publish and subscribe to several ROS topics to complete your Q-learning task:
ROS Topic | ROS msg Type | Notes |
/q_learning/q_matrix |
q_learning/QMatrix |
Each time you update your Q-matrix in your Q-learning algorithm, publish your Q-matrix to this topic. |
/q_learning/reward |
q_learning/QLearningReward |
You will subscribe to this topic to receive from the environment the reward after each action you take. |
/q_learning/robot_action |
q_learning/RobotMoveDBToBlock |
Every time you want to execute an action, publish a message to this topic (this is the same topic you'll be subscribing to in the node you write to have your robot execute the actions). |
For this project, we will represent \(Q\) as a matrix, where the rows correspond with the possible world states \(s_t\) and the columns represent actions the robot can take \(a_t\). The actions that the robot can possibly take is available through self.actions
in q_learning.py
of the starter code. It is organized as follows:
action number (column) | move dumbbell | to block number |
0 | red | 1 |
1 | red | 2 |
2 | red | 3 |
3 | green | 1 |
4 | green | 2 |
5 | green | 3 |
6 | blue | 1 |
7 | blue | 2 |
8 | blue | 3 |
There are 64 total states for the system to be in, which is also available to you in self.states
within q_learning.py
:
state number (row) | red dumbbell location | green dumbbell location | blue dumbbell location |
0 | origin | origin | origin |
1 | block 1 | origin | origin |
2 | block 2 | origin | origin |
3 | block 3 | origin | origin |
4 | origin | block 1 | origin |
5 | block 1 | block 1 | origin |
6 | block 2 | block 1 | origin |
7 | block 3 | block 1 | origin |
8 | origin | block 2 | origin |
9 | block 1 | block 2 | origin |
10 | block 2 | block 2 | origin |
11 | block 3 | block 2 | origin |
12 | origin | block 3 | origin |
13 | block 1 | block 3 | origin |
14 | block 2 | block 3 | origin |
15 | block 3 | block 3 | origin |
... | ... | ... | ... |
63 | block 3 | block 3 | block 3 |
Where
In addition to a Q-matrix, we have provided you with an action matrix, available through self.action_matrix
in q_learning.py
. The rows of the action matrix represents a starting state \((s_t)\) and the columns of the action matrix represents the next state \((s_{t+1})\). The matrix is set up such that \(\textrm{action_matrix}[s_t][s_{t+1}] = a_t \). Let's examine the example of \(\textrm{action_matrix}[0][12] = 5 \). In this case \(s_t = 0\), where all three dumbbells are at the origin, and \(s_{t+1} = 12\), where the red and blue dumbbells are at the origin and the green dumbbell is at block three, and \(a_t = 5\) which is the action corresponding with the robot taking the green dumbbell to block number 3.
All transitions from \(s_t\) to \(s_{t+1}\) that are impossible or invalid are assigned a value of -1. For example, since the robot can only carry one dumbbell at a time, the transition from state 0 to 6 is impossible. Additionally, only one dumbbell can sit in front of one numbered block at a time, so any transition to state 5 (where both the red and green dumbbells are at block 1) is also impossible and is assigned the value -1.
The value of having this action matrix comes into play when we're executing the \( \textrm{select} \: a_t \: \textrm{at random} \) step of the Q-learning algorithm. In order to select a random action, we take our current state \((s_t)\), and look up the row corresponding with that state in the action matrix. All the values that are not -1 represent valid actions that the robot can take from state \(s_t\). You can then pick one of these at random.
As highlighted in the Q-learning algorithm, you will iterate through the while loop, updating your Q-matrix, until your Q-matrix has converged. What we mean by "convergence" in this context is that your Q-matrix has reached its final form and no more updates or changes will occur to it. It's up to you to determine how to ascertain when your Q-matrix has converged.
In order to reach convergence of your Q-matrix, you're going to have to run many iterations of having the robot place the dumbbells in front of the numbered blocks and debug your code frequently. To make debugging easier, we've created two different nodes to help you with that:
$ rosrun q_learning_project phantom_robot_movement.py
$ rosrun q_learning_project virtual_reset_world.py
The phantom robot movement ROS node subscribes to robot actions given on the /q_learning/robot_action
ROS topic, so as long as you're sending robot actions on this topic and have the phantom robot movement node running, you should see something like what's pictured in the following gif.
This phantom robot movement node is designed to help you debug your learning code. Once you have ensured that your code works properly, you'll need to shut down the phantom robot movement ROS node and use the "virtual reset world" node that will let you quickly iterate through the Q-learning algorithm until you reach convergence.
The virtual reset world works by responding to robot actions given on the /q_learning/robot_action
ROS topic and publishes rewards to the /q_learning/reward
topic based on the actions it receives without manipulating the gazebo world, hence it is faster. If you are operating it correctly, you should see an output like the following:
robot_db: "blue"
block_id: 3
Published reward: 0
robot_db: "red"
block_id: 2
Published reward: 0
robot_db: "green"
block_id: 1
Published reward: 0
reseting the world
robot_db: "blue"
block_id: 3
Published reward: 0
robot_db: "red"
block_id: 1
Published reward: 0
robot_db: "green"
block_id: 2
Published reward: 0
reseting the world
Tips:
/q_learning/robot_action
messages properly) and that you use the virtual reset world node for RL training - converging your Q matrix over many iterations.Training the Q-matrix will often take some time and communicating this matrix to the next phase of operations could lead to issues. Hence, we ask you to save your Q-matrix in an appropriate file (e.g. csv) once it has converged. The action phase of this project will need to read/load the trained Q-matrix via this file.
One key component to this project is building a ROS node that can execute actions published to the /q_learning/robot_action
ROS topic (with a custom message type of q_learning/RobotMoveDBToBlock). When your ROS node receives a message on this topic, it should:
robot_db
attribute of the q_learning/RobotMoveDBToBlock messageblock_id
attribute of the q_learning/RobotMoveDBToBlock message, and
Write your code for this node in new Python file(s) within the /scripts
directory. Once you have all of your appropriate nodes working, fill in the action.launch
file so that it launches all of your nodes and executes appropriate actions based on the learned Q-Matrix. The following subsections will give you some more details and helpful tips on the perception and robot manipulator control components to programming these robot actions.
In order to pick up the dumbbells and carry them to the numbered block locations, your robot will need to be able to perceive:
To launch the Turtlebot3 RViz window, run:
$ roslaunch turtlebot3_gazebo turtlebot3_gazebo_rviz.launch
This should bring up an RViz window like the one pictured below.
One important thing to note is that you can visualize what the robot sees through it's RGB camera by checking the check box next to "Camera" (see the image above). Once you do, you can see "through the eyes of the robot" (see the image below).
You'll likely want to use a combination of the /scan
and /camera/rgb/image_raw
ROS topics to identify and locate the objects in the environment. For the detection of the dumbbells, you're more than welcome to leverage the code that we used for the line follower in class meeting 03.
While you are free to use any method you can find online recognizing the digits on the blocks,
we recommend keras_ocr
, which provides pre-trained and an end-to-end training pipeline for character recognition. On a high level, you can input an image and expect an output that details the characters found in the image and their location. For this project, you need to only use the pre-trained models.
To use keras_ocr
, you need to first install it via
$ pip install keras-ocr
Next, you need to set up the pipeline in your python script,
import keras_ocr
.
.
.
# download pre-trained model
pipeline = keras_ocr.pipeline.Pipeline()
# Once you have the pipeline, you can use it to recognize characters,
# images is a list of images in the cv2 format
images = [img1, img2, ...]
# call the recognizer on the list of images
prediction_groups = pipline.recognize(images)
# prediction_groups is a list of predictions for each image
# prediction_groups[0] is a list of tuples for recognized characters for img1
# the tuples are of the formate (word, box), where word is the word
# recognized by the recognizer and box is a rectangle in the image where the recognized words reside
For more information, please visit the documentation.
Tips:
In order to enable your robot to pick up the dumbbells, you'll need to get familiar with programming the Turtlebot3's OpenMANIPULATOR arm. Here's a list of resources to help you get up and running:
The following gifs show examples of the Turtlebot3 OpenMANIPULATOR arm picking up one of the dumbbells (note: one of these was in a prior iteration of the development of this project before the dumbbells had colors).
Tips:
Once your Q-matrix converges and has been saved to an appropriate file, you now have a Q-matrix that contains information about future expected reward for robot actions. You can now use the Q-matrix to make decisions about actions to take that will lead to the highest expected future reward. To do this, load your Q-matrix, take your current state \(s_t\) and look up the corresponding row in your Q-matrix. In this row, find the action (column) that corresponds with the highest Q-value. This is the action that will lead to the highest expected future reward.
Once you have implemented everything that's outlined above, this is how your program should work:
learning.launch
. This should launch your training nodes along with virtual_reset_world.py
and save a converged Q-matrix into a file. Make sure to let the user know once your matrix has converged. Once the Q-matrix has been saved in to a file, stop this command by pressing Ctrl-C
. turtlebot3_intro_robo_manipulation.launch
.action.launch
. This should launch ROS nodes that read in the saved Q-matrix and execute the robot action commands to pick up the dumbbells and place them in front of the numbered blocksThe design of this course project was influenced by Brian Scassellati and his Intelligent Robotics course taught at Yale University. Also, I want to thank my sister, Rachel Strohkorb, for creating the custom dumbbell model for our use in the Gazebo simulator.