ESP32 Digital Camera

Introduction

The objective of this project is to use Internet of Things Devices to create a wireless, interactive camera streaming system that can transmit live footage from an ESP32-CAM module to a another ESP32 board that has a 2.8” touchscreen display through a Flask-based web server is also part of the configuration for processing, control, and gallery administration. This project demonstrates how computer vision and user interfaces may be integrated with Internet of Things components like the ESP32 to create responsive, real-time video applications for hobbyist, educational, or surveillance purposes.

Concept

Images are captured by the ESP32-CAM module in JPEG format and sent to a local Flask server over HTTP POST via Wi-Fi.

Incoming frames are processed by the server, which then uses OpenCV and PIL to apply filters like grayscale, sepia, and cartoon effects before sending them to a second ESP32 device and a web dashboard.

The second ESP32 functions as a wireless receiver and has an XPT2046 touchscreen and a 320×240 TFT display. Through an easy-to-use touch-based user interface, it retrieves the most recent frames from the Flask server, shows them in real time, and facilitates interaction.

Hardware Description

• ESP32-CAM: Captures and uploads JPEG frames to the Flask server every 100 ms (~10 FPS). Flash control is available via GPIO4, synchronized with server-side toggle requests.

• ESP32-TFT Display: Uses the TFT_eSPI and TJpg_Decoder libraries to render the MJPEG stream on-screen. The touch interface (XPT2046) enables selecting filters, capturing snapshots, toggling flash, and browsing the image gallery.

• Flask Server (Python): Receives and redistributes camera frames, applies live filters, handles snapshot saving, and serves a responsive web dashboard with gallery access and filter controls.

Functionality Breakdown

ESP32-CAM Logic:

• Connects to WI-FI and initializes camera hardware, after that it captures frames using the esp_camera_fb_get() function allowing it to send evert 100ms JPEG images to the Flash server via the /upload endpoint. Also, polls /get_flash_state every loop cycle to synchronize the onboard flash LED with user toggles from the dashboard or ESP32 Display.

Flask Server Logic:

• Receives images from the ESP32-CAM and stores the latest frame in memory which is displayed on the dashboard and also sent to the ESP32Display via the /video_feed. Maintains a JSON compatible snapshot gallery index, used by the ESP32 display for navigation/

ESP32 Display Logic:

• Connects to the same WI-FI network and begins pooling /video_feed from the server. Decodes JPEG frames and renders them using TJpg_Decoder, overlaying visual filters via direct pixel manipulation. Offers a touch UI with 4 buttons allowing the user to apply a visual filter, access the snapshot gallery, take a snapshot, and toggle on and off the LED flash.

Synchronization and Interactivity

• All of the components are synchronized via the Flask server, establishing the following data flows:

                  ESP32-CAM --> Flask (/upload,/get_flash_state)
                  Flask --> ESP32-Display (/video_feed, /snapshots)
                  Web Dashboard --> Flask (/set_filter, /set_flash,etc.)
                  ESP32-Display <--> Flask(/save_snapshot, /gallery_list)
                  

Code Breakdown

Two esp32 boards are integrated in this project: one for capturing and sending image frames, and another for receiving and rendering live streams. A flask server serves as a central location for receiving, processing, storing, and serving frames. Each ESP32 device carries out distinct real-time duties in the system's distributed Internet of Things architecture, while the central Flask server controls user access and state.

The ESP32-CAM Firmware Initializing the OV2640 camera sensor, establishing a Wi-Fi connection, sending JPG frames to the server on a regular basis, and pooling the server for flash control state are its primary duties.

camera_fb_t* fb = esp_camera_fb_get();
HTTPClient http;
http.begin(server_url);
http.addHeader("Content-Type", "image/jpeg");
http.POST(fb->buf, fb->len);
esp_camera_fb_return(fb);

Every 100 milliseconds, the ESP32-CAM simulates a live stream at about 10 frames per second by using the esp_camera library to get a frame (camera_fb_t), which is then sent via HTTPClient to the /upload endpoint on the Flask server.

Flash synchronization is handled through:

http.begin(flash_check_url);
int flashRes = http.GET();
if (flashRes == 200 && state == "on") {
    digitalWrite(4, HIGH);  // Flash ON
}

The GPIO4 pin controls the onboard flash LED, toggled based on the server state.

The ESP32-Display Firmware

Enabling snapshot capturing and gallery browsing, rendering frames in real-time on a 320×240 TFT screen, showing and managing filter settings via touch, and connecting to the Flask /video_feed endpoint to decode MJPEG are among the primary duties.

1. JPEG Streaming and Decoding

The Flask server uses a MJPEG-style endpoint (/video_feed) that sends JPEG frames continuously. The JPEG byte stream is retrieved by the ESP32 display board by establishing a constant TCP connection in the manner described below:

streamClient.connect(flask_ip, 5000);
streamClient.println("GET /video_feed HTTP/1.1");

As data arrives, the board detects JPEG frame boundaries using standard JPEG markers:

  • 0xFFD8 → Start of Image (SOI)
  • 0xFFD9 → End of Image (EOI)

Once a full frame is received, it is passed to the TJpg_Decoder library:

TJpgDec.drawJpg(0, 0, jpgBuf, pos);

Using a callback, the TJpgDec.drawJpg() function renders the JPEG data to the screen block by block after decoding it straight from memory. This prevents full image buffering, lowers RAM use, and allows for fluid, real-time display for devices with limited memory, such as the ESP32. For direct TFT compatibility, color conversion to RGB565 is managed internally.

2. Touch UI and Button Mapping

Four primary buttons are mapped on the touchscreen interface: one to access the filter menu, which opens an overlay selection; another to capture the frame; a gallery button, which loads stored photographs kept on the flask server; and a button to turn the LED flash on or off.

Touch input is handled using the XPT2046_Touchscreen library. Raw analog readings are remapped to screen coordinates:

int x = map(p.y, 200, 3800, 0, SCREEN_W);
int y = map(p.x, 240, 3700, 0, SCREEN_H);

The screen is divided into interaction zones based on button boundaries:

No Area Action
1 y > 200 Bottom UI buttons (4 zones)
2 x < 80 && Y < 80 Filter menu (4 filters)
3 galleryMode && x < 60 or x > 180 Prev/Next Buttons

3. Real Time Filter Processing

Before rendering, frames can be processed in software to apply effects, implemented by iterating over each pixel's color components:

  • Grayscale gray = (r + g + b) / 3
  • Sepia: Applies weighted RGB transformation for warm tone
  • Invert r = 255 -r
if (menuVisible && x < 80 && y < 80) {
    if (y < 20) currentEffect = "None";
    else if (y < 40) currentEffect = "Grayscale";
    else if (y < 60) currentEffect = "Sepia";
    else if (y < 80) currentEffect = "Invert";
    menuVisible = false;
    needsUIRedraw = true;
    return;
  }

This transformation is embedded within the jpegDrawCallback():

bitmap[i] = ((r & 0xF8) << 8) | ((g & 0xFC) << 3) | (b >> 3);

4. Snapshot and Gallery Handling

void captureSnapshot() {
  HTTPClient http;
  http.begin("http://" + String(flask_ip) + ":5000/save_snapshot");
  http.POST("");
  http.end();
}

void fetchGalleryList() {
  HTTPClient http;
  http.begin("http://" + String(flask_ip) + ":5000/gallery_list");
  int res = http.GET();
  if (res == 200) {
    String payload = http.getString();
    DynamicJsonDocument doc(2048);
    deserializeJson(doc, payload);
    gallery_count = 0;
    for (JsonVariant v : doc.as<JsonArray>()) {
      gallery_list[gallery_count++] = v.as<String>();
      if (gallery_count >= 50) break;
    }
  }
  http.end();
}

void showGalleryImage() {
  if (gallery_count == 0 || !jpgBuf) return;
  String filename = gallery_list[current_gallery_index];
  String url = "http://" + String(flask_ip) + ":5000/snapshots/" + filename;

  HTTPClient http;
  http.begin(url);
  int httpCode = http.GET();
  if (httpCode == 200) {
    WiFiClient* stream = http.getStreamPtr();
    int pos = 0;
    while (stream->connected() && stream->available() && pos < 60 * 1024) {
      int c = stream->read();
      if (c < 0) break;
      jpgBuf[pos++] = (uint8_t)c;
    }
    if (pos > 1000) {
      TJpgDec.drawJpg(0, 0, jpgBuf, pos);
      if (needsUIRedraw) drawUI();
    }
  }
  http.end();
}

Triggers the Flask server to save the current last_frame as a .jpg file, and the gallery images are fetched by file naming using /snapshots/name and rendered one by one, the user can access all of the photos by tapping the edge of the screen to change the current photo displayed.

Final Product

After successfully uploading the ESP32 firmware and setting up the Python server, the final project should appear as follows:

ESP32 Display:

Web Server

Home Page:

Gallery:

Issues and Solutions

Numerous difficulties were faced throughout the project's development, notable among them being those pertaining to memory management, touchscreen input accuracy, real-time streaming performance, and hardware-specific constraints.

Achieving seamless MJPEG video streaming from the ESP32-CAM to the ESP32 display board was one of the initial challenges. Image tearing and some dropped frames were noted, particularly when frame boundaries were not accurately identified. In order to fix this, JPEG start and end markers (0xFFD8 and 0xFFD9) were carefully parsed, and any incomplete frames were discarded. Reliability was further enhanced by raising the JPEG buffer to 60 KB.

Real-time JPEG image decoding and rendering while maintaining touchscreen UI responsiveness was another formidable obstacle. Because of its effectiveness in decoding JPEGs straight to the TFT display with minimal memory overhead, the TJpg_Decoder library was chosen, enabling real-time rendering at 10–30 frames per second, depending on network speed.

Using the XPT2046 controller to integrate the touchscreen also needed additional work. A second SPIClass had to be manually constructed and the coordinates had to be adjusted to match the screen's landscape orientation because the touch controller was interfaced via the HSPI bus. In order to prevent unintentional tapping, extra attention was made to debounce inputs and segregate UI areas from video rendering.

The ESP32-2432S028R “cheap yellow display” board's limited documentation and patchy support constituted a significant development obstacle. Important details such as touchscreen wiring, lighting control, and SPI pin mappings were not easily accessible in official sources. Reverse engineering, trial-and-error testing, and modification from community forum examples were used to achieve a large portion of the integration.

Timeouts or incomplete answers were occasionally the outcome of HTTP communication with the Flask backend. This was fixed by decoupling lengthy operations (such as uploading a snapshot) from the user interface loop and encapsulating all HTTP requests with error checking and retry logic.

Despite these issues, all major system components now function as intended. The lessons learned highlight the importance of resource management, firmware modularity, and planning for hardware inconsistencies during embedded system development.

Conclusions

This project uses two ESP32 boards—the ESP32-CAM for video capture and the ESP32-2432S028R for display—to demonstrate a real-time wireless image streaming system. MJPEG streaming, realtime picture filtering, snapshots, and a touch-controlled gallery are all made possible by a specially designed Flask server. Despite the hardware limitations of the ESP32, we were able to create an interactive, lightweight GUI by integrating Wi-Fi communication with libraries such as TJpg_Decoder, TFT_eSPI, and XPT2046_Touchscreen. The project demonstrates how responsive IoT camera interfaces may be made with inexpensive microcontrollers and open-source technologies, potentially leading to applications for handheld cameras, monitoring, and surveillance.

References

iothings/proiecte/2025sric/digitalcamera.txt · Last modified: 2025/05/28 21:35 by adrian.vladescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0