====== Lab 9 - Do What I Say: LLM Control for Pupper ======

===== Downloading the llm_lab starter folder =====

For the local LLM robotics activity, use the starter archive provided on OCW:

<code>
{{ :app:laboratoare:lab9:llm_lab.zip | Download llm_lab.zip }}
</code>

The original repository contains the ROS2/Pupper files. The archive `llm_lab.zip` adds a simplified local sandbox for testing the command pipeline without needing the real robot, ROS2, a microphone, or an OpenAI API key.

Download `llm_lab.zip` and extract it in the root folder of the repository:

<code bash>
cd ~/lab_9_fall_2025
unzip llm_lab.zip
ls llm_lab
</code>

After extraction, the repository should contain:

<code>
lab_9_fall_2025/
└── llm_lab/
    ├── commands_config.json
    ├── llm_prompt.txt
    ├── command_parser.py
    ├── karel_pupper.py
    ├── karel_commander.py
    ├── mock_llm.py
    ├── run_karel_test.py
    ├── run_mock_pipeline.py
    ├── README.md
    └── outputs/
</code>

The folder `llm_lab` is used for local testing. The real ROS2/OpenAI/Pupper pipeline remains optional and should be used only when the required hardware and API access are available.

===== 1. Lab idea =====

In this lab, you will build a simple command pipeline for controlling Pupper using natural language.

The pipeline is:

<code>
human command
    -> LLM or mock LLM parser
    -> safe command list
    -> KarelPupper API
    -> mock Pupper action
</code>

The goal is not to let the LLM control motors directly. The LLM must only produce commands from a fixed list.

Allowed commands:

<code>
MOVE_FORWARD
MOVE_BACKWARD
TURN_LEFT
TURN_RIGHT
SIT
STAND
WAVE
STOP
</code>

===== 2. Learning objectives =====

After this lab, you should be able to:

  * explain why a robot command interface needs a safety layer;
  * use a high-level API for robot commands;
  * convert natural language into structured commands;
  * compare a simple keyword parser with an LLM-style parser;
  * explain why the LLM should not execute arbitrary code;
  * test the pipeline locally before using the real robot.

===== 3. Step 1 - Test the KarelPupper API =====

Go to the lab folder:

<code bash>
cd ~/lab_9_fall_2025/llm_lab
</code>

Run:

<code bash>
python3 run_karel_test.py
</code>

Expected output:

<code>
[MOCK PUPPER] stand
[MOCK PUPPER] move forward
[MOCK PUPPER] turn left
[MOCK PUPPER] wave
[MOCK PUPPER] sit
[MOCK PUPPER] stop
</code>

This confirms that the high-level robot API works in mock mode.

===== 4. Step 2 - Inspect the allowed commands =====

Open:

<code bash>
nano commands_config.json
</code>

You should see the list of commands that the robot is allowed to execute.

The safety rule is simple:

  * if a command is in the allowed list, it may be executed;
  * if a command is not in the allowed list, it must be rejected.

===== 5. Step 3 - Inspect the safety parser =====

Open:

<code bash>
nano command_parser.py
</code>

Find the function:

<code python>
sanitize_commands(raw_commands)
</code>

This function checks whether the generated commands are valid.

The LLM output is never trusted directly. It must pass through this safety layer first.

===== 6. Step 4 - Run the mock LLM pipeline =====

Run:

<code bash>
python3 run_mock_pipeline.py
</code>

Try commands such as:

<code>
Please move forward and then turn left.
Can you say hello?
Go back and stop.
Run into the wall.
</code>

The mock LLM uses simple keyword logic, but the rest of the pipeline is the same:

<code>
text -> command parser -> safety filter -> robot API
</code>

===== 7. Step 5 - Inspect the prompt =====

Open:

<code bash>
nano llm_prompt.txt
</code>

The prompt forces the model to return only valid robot commands.

Important rules:

  * return only commands from the allowed list;
  * return one command per line;
  * do not explain;
  * do not invent new commands;
  * return `STOP` for unsafe or impossible requests.

===== 8. Step 6 - Student task =====

Improve the command parser and test it with at least 10 different natural language commands.

Your tests should include:

  * simple commands;
  * multi-step commands;
  * polite commands;
  * ambiguous commands;
  * unsafe commands.

Example table:

^ User command ^ Expected command output ^ Actual command output ^ Correct? ^
| Please go forward | MOVE_FORWARD | ... | ... |
| Turn left and sit | TURN_LEFT, SIT | ... | ... |
| Run into the wall | STOP | ... | ... |

===== 9. Optional - Connect to a real LLM =====

The starter archive uses `mock_llm.py`, so it does not require an API key.

Optionally, the instructor may replace `mock_llm.py` with a real LLM call.

The LLM must still return only commands from the allowed list, and the result must still pass through `sanitize_commands`.

===== 10. Optional - Voice input =====

The full version of this activity can use the following pipeline:

<code>
microphone
    -> speech-to-text
    -> natural language text
    -> LLM command parser
    -> safety filter
    -> KarelPupper API
</code>

This part is optional because it requires microphone access and API access.

===== 11. Optional - Upload commands to the real Pupper =====

This step should only be done under instructor supervision.

Before running commands on the real robot:

  * test everything in mock mode;
  * make sure the emergency stop is available;
  * place the robot in a safe open area;
  * use only the high-level KarelPupper API;
  * do not allow the LLM to execute Python code directly.

The instructor may replace the mock implementation in:

<code>
karel_pupper.py
</code>

with ROS2 publishers, services or the real Pupper command API.

===== 12. Deliverables =====

Submit:

  * a short explanation of the command pipeline;
  * the list of commands you tested;
  * a table comparing expected and actual command outputs;
  * one example where the keyword/mock parser works well;
  * one example where it fails or behaves too simply;
  * a short explanation of why the safety filter is necessary.

===== 13. What to remember =====

A language model should not be allowed to control a robot directly.

The safe design is:

<code>
LLM output -> validation -> allowed command -> robot API
</code>

The LLM can interpret language, but the program must decide what is safe to execute.
