SCOUT-CAM is a compact, dual-mode reconnaissance rover designed for inspecting hazardous, confined, or hard-to-reach indoor environments where human entry is unsafe, slow, or simply impractical. Typical use cases include:
The rover streams live video from an on-board ESP32-CAM and is steered with a Sony DualShock 4 controller. It seamlessly switches between two operating modes:
The system has three logical layers, separated by the communication medium they use:
In manual mode the ESP8266 is essentially a translator: it turns network commands into signals for the servos and the DC motors. In autonomous mode it ignores the drive commands from the PC and instead runs a simple behaviour:
The operator can take control back at any moment by pressing the mode-toggle button on the PS4 controller — useful when the rover gets stuck or makes a poor decision.
To address the elephant in the room, the reason the PS4 controller input passes through the Host PC instead of going straight to the ESP32-CAM is that, because the module only has one available antenna, activating both Bluetooth (Classic) and WiFi at the same time would require handling allocating time slices to both protocols on the antenna. This isn't a problem on itself, but the video quality of the live feed (already pretty low) would take a nosedive, together with serious input lag for the controller inputs.
Even after control of the sensor, servos and motors was moved to the newly added ESP8266, the issue persists, as the microcontrollers doesn't have neither internally implemented Bluetooth, nor enough pins to attach a Bluetooth module. While an ESP32 (with integrated Bluetooth) might've been preferable in this situation, time and budget constraints did not allow purchasing of another microcontroller.
Another worth-mentioning design choice: the 9 V battery originally intended as the MB102's input ended up being swapped for a second 6×AA pack. A regular 9V battery simply couldn't source the peak current the ESP32-CAM draws while streaming, as well as the servos + ultrasonic sensor. The 6×AA stack supplies the same ~9 V nominal but with much more current headroom, and the live feed has been stable since.
| Adafruit Feather HUZZAH ESP8266 |
| ESP32-CAM (AI-Thinker) |
| 3-pin Makeblock ultrasonic sensor |
| 2× resistors (5 V → 3.3 V divider for the ultrasonic SIG line) |
| 2× Makeblock Analog servos |
| L298N dual H-bridge motor driver |
| 2× DC motors |
| MB102 power supply module |
| 9 V battery (MB102 input) |
| 3.7 V Li-Po battery (ESP8266 logic) |
| 6× AA battery pack (motor supply) |
| Host PC |
| Sony DualShock 4 |
| ESP32-CAM-MB USB to serial adapter for flashing |
ESP8266 (Feather HUZZAH):
| GPIO 4 | Servo 1 — signal |
| GPIO 5 | Servo 2 — signal |
| GPIO 2 | Ultrasonic SIG (via 5 V → 3.3 V voltage divider) |
| GPIO 16 | L298N ENA |
| GPIO 14 | L298N ENB |
| GPIO 0 | L298N IN1 |
| GPIO 15 | L298N IN2 |
| GPIO 13 | L298N IN3 |
| GPIO 12 | L298N IN4 |
| BAT/EN | 3.7 V Li-Po battery |
| GND | Common ground |
ESP32-CAM:
| 5V pin | MB102 5 V rail |
| GND | Common ground |
| (all other pins) | Not connected |
The host runs a Python script that reads the DualShock 4 via pygame/SDL2, parses the inputs (including clamping of values from left thumbstick), maps the right stick to two servo angles, and POSTs the snapshot as JSON to the ESP8266 at 20 Hz over the LAN. The Triangle button hits a separate /mode/toggle endpoint. Identical snapshots aren't re-sent — the host only POSTs when something actually changes, so the rover's tiny HTTP stack isn't drowning in duplicates while the operator's hands sit still.
The ESP8266 sketch is a small Arduino program that runs an HTTP server with two endpoints (/control, /mode/toggle). In manual mode it just unpacks the JSON into direction-and-duty signals for the L298N plus a couple of Servo.write() calls, with a 1-second host-silence failsafe that kills the motors if the PC stops talking. In autonomous mode it ignores the incoming /control packets entirely and runs a small state machine: ping the ultrasonic every 60 ms, drive forward at full PWM while clear, pivot left briefly when an obstacle is detected closer than 6 cm, re-sample, if still blocked pivot right twice as long, repeat until something opens up. Readings closer than 3 cm are dropped as sensor ringdown — they were the main culprit for the rover thinking it had hit a wall when it hadn't.
The ESP32-CAM, meanwhile, runs the stock CameraWebServer example from the Arduino-ESP32 distribution, unchanged: it joins the same Wi-Fi network and serves an MJPEG stream on port 80. The host pulls that stream over HTTP and shows it in its own window. The control plane (PC ↔ ESP8266) and the video plane (PC ← ESP32-CAM) are entirely independent — neither side knows about the other, which means a hiccup on the video side can't stall the controls, and vice versa.