Beyond the steps you've already taken in Lab 3, this lab requires a few additional setup steps. Follow these in order.
python3 --version
If needed, install a newer Python and recreate your virtual environment.
hri_course_misty_programming, create a folder named lab_5_LLM_based_human_robot_dialogue for the lab code:
cd hri_course_misty_programming
mkdir lab_5_LLM_based_human_robot_dialogue
llm_based_human_robot_dialogue_revai.py — streams mic audio to Rev.ai via WebSocket. Lower latency, but the WebSocket SDK can have SSL handshake issues on some macOS setups.llm_based_human_robot_dialogue_whisper.py — records a full utterance locally, then sends a WAV file to OpenAI Whisper. Simpler and more portable, with slightly more latency at the end of each utterance.SystemError: new style getargs format but argument is not a tuple on your machine, switch to the Whisper version.
source venv/bin/activate
pip install pyaudio
pip install google-genai
pip install openai
pip install mutagen
If you're using the Rev.ai version of the code, also run:
pip install rev-ai
.env file: Create a file named .env inside hri_course_misty_programming and paste the Gemini and OpenAI API keys you were provided via email/Canvas. If you're using the Rev.ai version, also add your REVAI_ACCESS_TOKEN.
hri_course_misty_programming either use our template repo or manually copy the starter files into lab_5_LLM_based_human_robot_dialogue.
hri_course_misty_programming/
├── venv/
├── Python-SDK/
├── lab_3_misty_introduction/
│ └── misty_introduction.py
├── lab_4_misty_woz_gui/
│ └── lab_4_misty_woz_gui.py
└── lab_5_LLM_based_human_robot_dialogue/
├── llm_based_human_robot_dialogue_revai.py # Rev.ai version
├── llm_based_human_robot_dialogue_whisper.py # Whisper version
├── three_good_things_system_instruction.txt
└── test_dependencies.py
Important: Python-SDK and your lab folders should be directly inside hri_course_misty_programming. Do not put the SDK inside your lab folder.
python3 test_dependencies.py
If everything is configured correctly, the script should exit without errors.
portaudio.h file not found): PyAudio requires PortAudio to be installed at the system level before pip can build it. Install PortAudio via Homebrew first, then retry:
brew install portaudio
pip install pyaudio
If you don't have Homebrew installed, install it first:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
deactivate
brew install python@3.11
cd ~/hri_course_misty_programming
rm -rf venv
python3.11 -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install google-genai openai python-dotenv mutagen pyaudio requests
If the Misty Python-SDK folder contains a requirements.txt, reinstall its dependencies too:
cd Python-SDK
pip install -r requirements.txt
cd ..
import pyaudio
p = pyaudio.PyAudio()
print(p.get_default_input_device_info())
If the reported defaultSampleRate is different from 16000 (e.g., 44100 on most Macs), update the AUDIO_RATE variable at the top of your dialogue file to match.
SILENCE_THRESHOLD at the top of llm_based_human_robot_dialogue_whisper.py from 500 to 1000 or 1500. If it cuts you off before you're done talking, either lower SILENCE_THRESHOLD (to 200 or 300) or raise SILENCE_DURATION from 2.0 to 3.0.
SystemError: new style getargs format but argument is not a tuple: This is a known SSL handshake bug in the websocket-client library that rev-ai depends on. It affects some macOS Python installations regardless of Python version. The Rev.ai version of the starter code includes a monkeypatch that resolves this for most students, but if you still hit the error after running it, switch to llm_based_human_robot_dialogue_whisper.py instead — the Whisper version does not use WebSockets and is not affected by this bug.
python3 -m http.server 8080
Then set HTTP_SERVER_PORT = 8080 at the top of your dialogue file.
During this lab, you will work with the same group that you worked with for Lab 4. Similar to Lab 4, each group will turn in one piece of code / set of deliverables.
With the starter code we've provided, in Lab 5 you are expected to:
three_good_things_system_instruction.txt to enable the Misty robot to guide a human participant through the "Three Good Things" exercise.custom_actions dictionary of your dialogue file.
Your are expected to upload the following to Canvas after you have completed the lab:
llm_based_human_robot_dialogue_revai.py or llm_based_human_robot_dialogue_whisper.py)three_good_things_system_instruction.txtTo receive credit for this lab, you will need to submit your video and code to Canvas by Thursday, April 23, 2026 at 11:59pm.
hri_course_misty_programming directory: python3 -m http.server
HTTP_SERVER_PORT variable at the top of your dialogue file to match.The starter code contains several files:
llm_based_human_robot_dialogue_revai.py - Rev.ai streaming versionllm_based_human_robot_dialogue_whisper.py - OpenAI Whisper record-and-transcribe versionthree_good_things_system_instruction.txt - the system instruction for the Gemini generative text modeltest_dependencies.py - used to test the dependency packages and API keys required for this labtest_custom_actions.py - used to test the custom actions you will develop for Mistygen_ai_test.py - used to test the Gemini generative text model based on the system instruction (three_good_things_system_instruction.txt) without needing to be connected to or run anything on the robotWhile it is not required to know how the dialogue code works in detail for the purposes of completing this lab, I want to provide a brief overview for those interested in how it enables Misty to have a back-and-forth conversation with a person. This conversation consists of three main steps: speech-to-text, text generation, and text-to-speech.
Speech-to-text: This lab provides two interchangeable implementations for transcribing the human participant's speech to text. Both begin by turning Misty's LED blue and opening a local microphone stream using PyAudio in start_listening(); they differ in how the audio reaches the transcription service.
llm_based_human_robot_dialogue_revai.py) streams mic audio chunks to Rev.ai's streaming API over a WebSocket as you speak, and Rev.ai returns partial and final hypotheses in real time. Once a final hypothesis is followed by a silence timeout, the transcript is stored in self.current_transcript.llm_based_human_robot_dialogue_whisper.py) records locally, watching the volume of each chunk: once it detects speech followed by a configurable duration of silence, it stops recording, saves the audio as a WAV file, and sends it to OpenAI's Whisper API in a single HTTP request. The returned transcript is stored in self.current_transcript.
Text generation: The code in this lab uses Gemini's text generation chat model via the new google-genai SDK, allowing for multi-turn conversations. The chat session is initialized using client.chats.create() with gemini-2.5-flash in the __init__ method of starter code. The text generation occurs inside execute_human_robot_dialogue() via chat.send_message().
Text-to-Speech: The text generated by the Gemini model is then converted to speech using OpenAI's text-to-speech API. This conversion occurs inside execute_human_robot_dialogue() in starter code and the resulting audio file is then played on the robot.
The primary focus of this lab will be on prompt engineering. In the three_good_things_system_instruction.txt file, you will find a system instruction that is used to prompt the Gemini model to generate text for Misty. Right now, the system instruction guides the behavior of a robot receptionist in the CS department at UChicago. You will need to modify this system instruction to enable Misty to guide a human participant through the "Three Good Things" exercise.
If you want to test your system prompt independently from the Misty robot, you can do so by running gen_ai_test.py from the starter code in your terminal. This will allow you to communicate with the model only with text, enabling you to develop more quickly.
As a reminder, here is the desired interaction flow for the "Three Good Things" positive psychology exercise:
For this lab, you are asked to develop 5 additional custom actions for the robot. To develop these custom actions, we recommend you check out the following resources:
test_custom_actions.py file in the starter code. This file will allow you to test your just your custom actions without needing to run the whole robot "Three Good Things" exercise. custom_actions dictionary in your dialogue file and in the <your_expression> tag within three_good_things_system_instruction.txt. The rest of this section delves into how the robot expressions are executed within the starter code.
In the starter code, we have defined four robot expressions, called actions in the Misty SDK, in the custom_actions dictionary at the top of your dialogue file:
custom_actions = {
"reset": "IMAGE:e_DefaultContent.jpg; ARMS:40,40,1000; HEAD:-5,0,0,1000;",
"head-up-down-nod": "IMAGE:e_DefaultContent.jpg; HEAD:-15,0,0,500; PAUSE:500; HEAD:5,0,0,500; PAUSE:500; HEAD:-15,0,0,500; PAUSE:500; HEAD:5,0,0,500; PAUSE:500; HEAD:-5,0,0,500; PAUSE:500;",
"hi": "IMAGE:e_Admiration.jpg; ARMS:-80,40,100;",
"listen": "IMAGE:e_Surprise.jpg; HEAD:-6,30,0,1000; PAUSE:2500; HEAD:-5,0,0,500; IMAGE:e_DefaultContent.jpg;"
}
While the actions are defined in string format in the custom_actions dictionary, they are registered on the Misty robot inside MistyRobot.__init__() in starter code. When the Gemini model (self.chat) generates a text response for Misty to speak, it will also generate an action expression for the robot that corresponds with that text (e.g., "hi", "listen"), which is then parsed from the JSON response inside execute_human_robot_dialogue().
These expressions can be generated by the Gemini model because the list of expressions the robot can execute are provided in the system instruction (three_good_things_system_instruction.txt):
<your_expression>
Your expression should be one of the ones from this list.
These expressions can represent how you are feeling or be a reaction to what the student has said.
Please refrain from choosing an expression multiple times in a row: [
'head-up-down-nod',
'hi',
'listen'
]
</your_expression>
After the expression is generated by the Gemini chat model, it is looked up in custom_actions and executed on the robot via self.misty.start_action() inside execute_human_robot_dialogue() in the starter code.
The final component for your assignment is exploring the voice options from OpenAI. In your dialogue file, the text-to-speech call inside execute_human_robot_dialogue() in the starter code looks like this:
# OpenAI text-to-speech: generating speech and saving to a file
with self.openai_client.audio.speech.with_streaming_response.create(
model="gpt-4o-mini-tts",
voice="alloy",
instructions="Speak with a calm and encouraging tone.",
) as response:
response.stream_to_file(self.speech_file_path_local)
You will need to replace the voice and instructions parameters with your own selection. You can play around with the available voices and instructions for the voices at https://www.openai.fm/.