Table of Contents

SCOUT-CAM: Remote Reconnaissance Rover

Introduction

SCOUT-CAM is a compact, dual-mode reconnaissance rover designed for inspecting hazardous, confined, or hard-to-reach indoor environments where human entry is unsafe, slow, or simply impractical. Typical use cases include:

The rover streams live video from an on-board ESP32-CAM and is steered with a Sony DualShock 4 controller. It seamlessly switches between two operating modes:

General Description

The system has three logical layers, separated by the communication medium they use:

  1. PS4 controller (Bluetooth) — provides the human interface. Left stick → forward/turn. Right stick → camera pan/tilt. Triangle button → toggle autonomous/manual mode. Cross button → emergency stop.
  2. Host computer (Wi-Fi) — receives controller events over Bluetooth, converts them into high-level commands and forwards them as HTTP Requests to the ESP8266 over a TCP socket. The same script also pulls the MJPEG video stream from the ESP32-CAM and displays it in a window for the operator.
  3. Rover (ESP8266 + ESP32-CAM) — the ESP8266 parses incoming commands, drives the L298N H-bridge and the two servos, samples the ultrasonic sensor, and runs the obstacle-avoidance state machine when in autonomous mode. The ESP32-CAM runs independently alongside it, doing nothing but serving the live camera feed over HTTP.

In manual mode the ESP8266 is essentially a translator: it turns network commands into signals for the servos and the DC motors. In autonomous mode it ignores the drive commands from the PC and instead runs a simple behaviour:

  1. Go forward.
  2. If an obstacle is detected within 20 cm, stop.
  3. Pan the camera left and right, take a distance reading at each side.
  4. Turn toward the side with more free space; resume forward motion.

The operator can take control back at any moment by pressing the mode-toggle button on the PS4 controller — useful when the rover gets stuck or makes a poor decision.

Hardware Design

To address the elephant in the room, the reason the PS4 controller input passes through the Host PC instead of going straight to the ESP32-CAM is that, because the module only has one available antenna, activating both Bluetooth (Classic) and WiFi at the same time would require handling allocating time slices to both protocols on the antenna. This isn't a problem on itself, but the video quality of the live feed (already pretty low) would take a nosedive, together with serious input lag for the controller inputs.

Even after control of the sensor, servos and motors was moved to the newly added ESP8266, the issue persists, as the microcontrollers doesn't have neither internally implemented Bluetooth, nor enough pins to attach a Bluetooth module. While an ESP32 (with integrated Bluetooth) might've been preferable in this situation, time and budget constraints did not allow purchasing of another microcontroller.

Another worth-mentioning design choice: the 9 V battery originally intended as the MB102's input ended up being swapped for a second 6×AA pack. A regular 9V battery simply couldn't source the peak current the ESP32-CAM draws while streaming, as well as the servos + ultrasonic sensor. The 6×AA stack supplies the same ~9 V nominal but with much more current headroom, and the live feed has been stable since.

Hardware Modules

Adafruit Feather HUZZAH ESP8266
ESP32-CAM (AI-Thinker)
3-pin Makeblock ultrasonic sensor
2× resistors (5 V → 3.3 V divider for the ultrasonic SIG line)
2× Makeblock Analog servos
L298N dual H-bridge motor driver
2× DC motors
MB102 power supply module
9 V battery (MB102 input)
3.7 V Li-Po battery (ESP8266 logic)
6× AA battery pack (motor supply)
Host PC
Sony DualShock 4
ESP32-CAM-MB USB to serial adapter for flashing

Pin Connections

ESP8266 (Feather HUZZAH):

GPIO 4 Servo 1 — signal
GPIO 5 Servo 2 — signal
GPIO 2 Ultrasonic SIG (via 5 V → 3.3 V voltage divider)
GPIO 16 L298N ENA
GPIO 14 L298N ENB
GPIO 0 L298N IN1
GPIO 15 L298N IN2
GPIO 13 L298N IN3
GPIO 12 L298N IN4
BAT/EN 3.7 V Li-Po battery
GND Common ground

ESP32-CAM:

5V pin MB102 5 V rail
GND Common ground
(all other pins) Not connected

Labs used

Software Design

The host runs a Python script that reads the DualShock 4 via pygame/SDL2, parses the inputs (including clamping of values from left thumbstick), maps the right stick to two servo angles, and POSTs the snapshot as JSON to the ESP8266 at 20 Hz over the LAN. The Triangle button hits a separate /mode/toggle endpoint. Identical snapshots aren't re-sent — the host only POSTs when something actually changes, so the rover's tiny HTTP stack isn't drowning in duplicates while the operator's hands sit still.

The ESP8266 sketch is a small Arduino program that runs an HTTP server with two endpoints (/control, /mode/toggle). In manual mode it just unpacks the JSON into direction-and-duty signals for the L298N plus a couple of Servo.write() calls, with a 1-second host-silence failsafe that kills the motors if the PC stops talking. In autonomous mode it ignores the incoming /control packets entirely and runs a small state machine: ping the ultrasonic every 60 ms, drive forward at full PWM while clear, pivot left briefly when an obstacle is detected closer than 6 cm, re-sample, if still blocked pivot right twice as long, repeat until something opens up. Readings closer than 3 cm are dropped as sensor ringdown — they were the main culprit for the rover thinking it had hit a wall when it hadn't.

The ESP32-CAM, meanwhile, runs the stock CameraWebServer example from the Arduino-ESP32 distribution, unchanged: it joins the same Wi-Fi network and serves an MJPEG stream on port 80. The host pulls that stream over HTTP and shows it in its own window. The control plane (PC ↔ ESP8266) and the video plane (PC ← ESP32-CAM) are entirely independent — neither side knows about the other, which means a hiccup on the video side can't stall the controls, and vice versa.