For this lab, use the starter archive provided on OCW:
The original repository contains the ROS2/Pupper files. The archive `llm_lab.zip` adds a simplified sandbox for testing the command pipeline before running it on the real robot.
Download `llm_lab.zip` and extract it in the root folder of the repository:
cd ~/lab_9_fall_2025 unzip llm_lab.zip ls llm_lab
After extraction, the repository should contain:
lab_9_fall_2025/
└── llm_lab/
├── commands_config.json
├── llm_prompt.txt
├── command_parser.py
├── karel_pupper.py
├── karel_commander.py
├── mock_llm.py
├── run_karel_test.py
├── run_mock_pipeline.py
├── README.md
└── outputs/
The folder `llm_lab` is used first for local testing. After the local pipeline works, students must connect the command parser to a real LLM and then use the validated commands to control the real Pupper robot.
In this lab, you will build a natural language command pipeline for Pupper.
The final pipeline is:
human command
-> real LLM parser
-> safety filter
-> KarelPupper API
-> real Pupper action
The LLM must not execute Python code directly. It is only allowed to return commands from a fixed list.
Allowed commands:
MOVE_FORWARD MOVE_BACKWARD TURN_LEFT TURN_RIGHT SIT STAND WAVE STOP
The safety filter checks the LLM output before any robot command is executed.
Go to the lab folder:
cd ~/lab_9_fall_2025/llm_lab
Run:
python3 run_karel_test.py
Expected output:
[MOCK PUPPER] stand [MOCK PUPPER] move forward [MOCK PUPPER] turn left [MOCK PUPPER] wave [MOCK PUPPER] sit [MOCK PUPPER] stop
This confirms that the high-level Pupper API works in mock mode.
Open:
nano commands_config.json
This file contains the commands that the robot is allowed to execute.
The safety rule is:
Open:
nano command_parser.py
Find the function:
sanitize_commands(raw_commands)
This function validates the output generated by the LLM.
The LLM output is never trusted directly. It must always pass through this safety layer before reaching the robot.
Run:
python3 run_mock_pipeline.py
Try commands such as:
Please move forward and then turn left. Can you say hello? Go back and stop. Run into the wall.
The mock pipeline uses a simple local parser. This is only the baseline. It is used to understand the command flow before connecting a real LLM.
Your first task is to replace the mock parser with a real LLM call.
The LLM must receive the user command and return only commands from the allowed list.
Open the prompt file:
nano llm_prompt.txt
The prompt must force the model to follow these rules:
You must implement or adapt the LLM call so that a command such as:
Please move forward and then turn left.
returns:
MOVE_FORWARD TURN_LEFT
After receiving the LLM output, pass it through the safety parser before executing it.
The required pipeline is:
user text
-> real LLM
-> raw command output
-> sanitize_commands()
-> KarelPupper API
Do not execute code generated by the LLM.
Test at least 10 natural language commands.
Your tests must include:
Example table:
| User command | Mock parser output | Real LLM output | Expected output | Correct? |
|---|---|---|---|---|
| Please go forward | MOVE_FORWARD | … | MOVE_FORWARD | … |
| Turn left and sit | TURN_LEFT, SIT | … | TURN_LEFT, SIT | … |
| Can you say hello? | WAVE | … | WAVE | … |
| Run into the wall | STOP | … | STOP | … |
Write a short conclusion explaining where the real LLM performs better than the simple mock parser.
After the real LLM parser works in local mode, connect the validated command pipeline to the real Pupper robot.
This step must be done under instructor supervision.
Before running anything on the real robot:
The real robot pipeline must be:
human command
-> real LLM
-> sanitize_commands()
-> KarelPupper command
-> real Pupper action
The LLM is not allowed to control motors directly. It may only select one or more commands from the allowed command list.
Demonstrate at least 5 commands on the real Pupper robot.
Your demonstration must include:
Example commands:
Stand up. Move forward. Turn left and sit. Say hello. Stop. Run into the wall.
Record the observed robot behavior in your report.
Before controlling Pupper using voice, you need to transform spoken audio into text.
This step is called speech-to-text or audio transcription.
The idea is simple:
spoken command
-> audio recording
-> speech-to-text model
-> transcribed text
For example:
Audio: "Please move forward and then turn left."
should become:
Please move forward and then turn left.
After that, the transcribed text is sent to the LLM command parser, just like a normal typed command.
The full voice pipeline becomes:
microphone
-> audio file
-> speech-to-text
-> transcribed text
-> LLM command parser
-> sanitize_commands()
-> KarelPupper API
Important: the speech-to-text result is only text. It must not be executed directly. It must still go through the LLM parser and the safety filter.
There are several ways to implement speech-to-text.
This is the simplest option for this lab.
The program records a short audio clip, saves it as a `.wav` file, sends the file to a speech-to-text model, and receives the transcription.
Example structure:
def record_audio(): # record 3-5 seconds from the microphone # save the result as command.wav return "command.wav" def speech_to_text(audio_path): # send audio_path to a transcription model # return the transcribed text return transcribed_text
This option is easier to debug because each step can be tested separately:
A local speech-to-text model runs on your own computer.
Advantages:
Disadvantages:
The pipeline is still the same:
recorded audio
-> local speech-to-text model
-> text
-> LLM command parser
An API-based model sends the audio file to a remote speech-to-text service and receives the transcription.
Advantages:
Disadvantages:
Example structure:
def speech_to_text(audio_path): with open(audio_path, "rb") as audio_file: # send audio_file to the transcription API # receive transcription pass ``` return transcription_text ```
Realtime speech-to-text processes the microphone input continuously.
This is more advanced and is not recommended as the first implementation.
For this lab, start with short audio recordings of 3-5 seconds. After that works, realtime transcription can be added as an extension.
For this lab, use the simple recording-based approach:
1. Record 3-5 seconds of audio. 2. Save the audio as a .wav file. 3. Send the .wav file to a speech-to-text model. 4. Receive the transcription. 5. Send the transcription to the LLM command parser. 6. Validate the LLM output using sanitize_commands(). 7. Execute only validated commands.
The function you need to implement should look like this:
def speech_to_text(audio_path): """ Receives the path to an audio file. Returns the transcribed text. """ # TODO: call a speech-to-text model here return transcribed_text
Then the voice command pipeline should look like this:
audio_path = record_audio() text = speech_to_text(audio_path) raw_llm_output = call_llm(text) commands = sanitize_commands(raw_llm_output.splitlines()) execute_commands(robot, commands)
Before using the real Pupper robot, test speech-to-text with simple commands:
Stand up. Move forward. Turn left. Turn right. Sit down. Say hello. Stop.
For each test, write down:
* what you said; * what the speech-to-text system transcribed; * what the LLM returned; * what commands passed the safety filter; * what the mock robot executed.
Example table:
| Spoken command | Transcription | LLM output | Validated commands | Mock output |
|---|---|---|---|---|
| Move forward | Move forward | MOVE_FORWARD | MOVE_FORWARD | [MOCK PUPPER] move forward |
| Turn left and sit | Turn left and sit | TURN_LEFT, SIT | TURN_LEFT, SIT | [MOCK PUPPER] turn left / sit |
| Run into the wall | Run into the wall | STOP | STOP | [MOCK PUPPER] stop |
Check that your system can access the microphone.
Try recording a short audio file first, without using the LLM.
Speak clearly and use short commands.
Bad input:
Can you maybe, like, go there and do the thing?
Better input:
Move forward and turn left.
If the transcription is wrong, the LLM may produce the wrong command.
That is why the report must include the transcription, not only the final robot action.
This is normal. The LLM parser should still extract the intended command.
Example:
Please, can you move forward a little bit?
Expected command:
MOVE_FORWARD
If the user says something unsafe, the final validated command should be:
STOP
Example:
Run into the wall.
Expected output:
STOP
Demonstrate at least 5 commands on the real Pupper robot using voice input.
Your demonstration must include:
Example spoken commands:
For each demonstrated command, record:
Example table:
| Spoken command | Transcription | LLM output | Validated commands | Real robot behavior |
|---|---|---|---|---|
| Stand up | Stand up | STAND | STAND | Robot stands |
| Move forward | Move forward | MOVE_FORWARD | MOVE_FORWARD | Robot moves forward |
| Turn left and sit | Turn left and sit | TURN_LEFT, SIT | TURN_LEFT, SIT | Robot turns left, then sits |
| Run into the wall | Run into the wall | STOP | STOP | Robot stops / does not execute unsafe movement |
Submit:
A language model should not be allowed to control a robot directly.
The safe design is:
LLM output
-> validation
-> allowed command
-> high-level robot API
The LLM can interpret natural language, but the program must decide what is safe to execute.