This is an old revision of the document!


SCOUT-CAM: Remote Reconnaissance Rover

Introduction

SCOUT-CAM is a compact, dual-mode reconnaissance rover designed for inspecting hazardous, confined, or hard-to-reach indoor environments where human entry is unsafe, slow, or simply impractical. Typical use cases include:

  • Checking suspicious packages or unattended bags from a safe distance.
  • Surveying smoke-filled rooms before firefighter entry.
  • Inspecting crawl spaces and attics.
  • Scouting around collapsed furniture or shelving in search-and-rescue training exercises.
  • Looking for lost pets, leaks, or chewed cables in places a human cannot easily reach.

The rover streams live video from an on-board ESP32-CAM and is steered with a Sony DualShock 4 controller. It seamlessly switches between two operating modes:

  • Tele-operated mode — the operator drives the rover and aims the camera using the PS4 controller's analog sticks.
  • Autonomous mode — when activated, the rover uses an ultrasonic sensor to avoid obstacles and continue exploring on its own, so it never freezes in a dangerous spot.

General Description

The system has three logical layers, separated by the communication medium they use:

  1. PS4 controller (Bluetooth) — provides the human interface. Left stick → forward/turn. Right stick → camera pan/tilt. Triangle button → toggle autonomous/manual mode. Cross button → emergency stop.
  2. Host computer (Wi-Fi) — receives controller events over Bluetooth, converts them into high-level commands and forwards them as HTTP Requests to the ESP8266 over a TCP socket. The same script also pulls the MJPEG video stream from the ESP32-CAM and displays it in a window for the operator.
  3. Rover (ESP8266 + ESP32-CAM) — the ESP8266 parses incoming commands, drives the L298N H-bridge and the two servos, samples the ultrasonic sensor, and runs the obstacle-avoidance state machine when in autonomous mode. The ESP32-CAM runs independently alongside it, doing nothing but serving the live camera feed over HTTP.

In manual mode the ESP8266 is essentially a translator: it turns network commands into signals for the servos and the DC motors. In autonomous mode it ignores the drive commands from the PC and instead runs a simple behaviour:

  1. Go forward.
  2. If an obstacle is detected within 20 cm, stop.
  3. Pan the camera left and right, take a distance reading at each side.
  4. Turn toward the side with more free space; resume forward motion.

The operator can take control back at any moment by pressing the mode-toggle button on the PS4 controller — useful when the rover gets stuck or makes a poor decision.

Hardware Design

To address the elephant in the room, the reason the PS4 controller input passes through the Host PC instead of going straight to the ESP32-CAM is that, because the module only has one available antenna, activating both Bluetooth (Classic) and WiFi at the same time would require handling allocating time slices to both protocols on the antenna. This isn't a problem on itself, but the video quality of the live feed (already pretty low) would take a nosedive, together with serious input lag for the controller inputs.

Even after control of the sensor, servos and motors was moved to the newly added ESP8266, the issue persists, as the microcontrollers doesn't have neither internally implemented Bluetooth, nor enough pins to attach a Bluetooth module. While an ESP32 (with integrated Bluetooth) might've been preferable in this situation, time and budget constraints did not allow purchasing of another microcontroller.

Another worth-mentioning design choice: the 9 V battery originally intended as the MB102's input ended up being swapped for a second 6×AA pack. A regular 9V battery simply couldn't source the peak current the ESP32-CAM draws while streaming, as well as the servos + ultrasonic sensor. The 6×AA stack supplies the same ~9 V nominal but with much more current headroom, and the live feed has been stable since.

Hardware Modules

Adafruit Feather HUZZAH ESP8266
ESP32-CAM (AI-Thinker)
3-pin Makeblock ultrasonic sensor
2× resistors (5 V → 3.3 V divider for the ultrasonic SIG line)
2× Makeblock Analog servos
L298N dual H-bridge motor driver
2× DC motors
MB102 power supply module
9 V battery (MB102 input)
3.7 V Li-Po battery (ESP8266 logic)
6× AA battery pack (motor supply)
Host PC
Sony DualShock 4
ESP32-CAM-MB USB to serial adapter for flashing

Pin Connections

ESP8266 (Feather HUZZAH):

GPIO 4 Servo 1 — signal
GPIO 5 Servo 2 — signal
GPIO 2 Ultrasonic SIG (via 5 V → 3.3 V voltage divider)
GPIO 16 L298N ENA
GPIO 14 L298N ENB
GPIO 0 L298N IN1
GPIO 15 L298N IN2
GPIO 13 L298N IN3
GPIO 12 L298N IN4
BAT/EN 3.7 V Li-Po battery
GND Common ground

ESP32-CAM:

5V pin MB102 5 V rail
GND Common ground
(all other pins) Not connected

Labs used

  • GPIO — driving the servo signal pins, the L298N direction inputs, and the bidirectional ultrasonic SIG line.
  • PWM — generating the speed-control duty cycles for the L298N's ENA and ENB inputs that set how fast each DC motor turns.
  • USART — the Serial Monitor output over USB used for general debugging, plus the boot-time Wi-Fi connection status prints, which is how we know the ESP8266 actually came up and what IP it grabbed.
  • I2C (SCCB variant) — runs internally on the ESP32-CAM module between the Ai-Thinker chip and the OV2640 image sensor. We don't touch this bus ourselves — the stock camera firmware drives it — but it's the protocol that gets the picture out of the sensor and into the framebuffer that gets MJPEG-encoded.

Software Design

The host runs a Python script that reads the DualShock 4 via pygame/SDL2, mixes the left stick into a pair of signed L298N motor PWM commands using a standard tank-drive mix (forward + turn → left wheel = forward + turn, right wheel = forward - turn, each clipped to ±255), maps the right stick to two servo angles, and POSTs the snapshot as JSON to the ESP8266 at 20 Hz over the LAN. The Triangle button hits a separate /mode/toggle endpoint, Cross hits /estop. Identical snapshots aren't re-sent — the host only POSTs when something actually changes, so the rover's tiny HTTP stack isn't drowning in duplicates while the operator's hands sit still.

The ESP8266 sketch is a small Arduino program that runs an HTTP server with three endpoints (/control, /estop, /mode/toggle). In manual mode it just unpacks the JSON into direction-and-duty signals for the L298N plus a couple of Servo.write() calls, with a 1-second host-silence failsafe that kills the motors if the PC stops talking. In autonomous mode it ignores the incoming /control packets entirely and runs a small state machine: ping the ultrasonic every 60 ms, drive forward at full PWM while clear, pivot left briefly when an obstacle is detected closer than 6 cm, re-sample, if still blocked pivot right twice as long, repeat until something opens up. Readings closer than 3 cm are dropped as sensor ringdown — they were the main culprit for the rover thinking it had hit a wall when it hadn't.

The ESP32-CAM, meanwhile, runs the stock CameraWebServer example from the Arduino-ESP32 distribution, unchanged: it joins the same Wi-Fi network and serves an MJPEG stream on port 80. The host pulls that stream over HTTP and shows it in its own window. The control plane (PC ↔ ESP8266) and the video plane (PC ← ESP32-CAM) are entirely independent — neither side knows about the other, which means a hiccup on the video side can't stall the controls, and vice versa.

pm/prj2026/jan.vaduva/raul_ionut.nastasie.1779633256.txt.gz · Last modified: 2026/05/24 17:34 by raul_ionut.nastasie
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0