Image Recognition

Author: Alexandru Vrabii
Email: alexandru.vrabii@stud.acs.upb.ro
Master: AAC
Academic year: 2023-2024

Introduction

Context

The idea for this project came from the desire to develop my bachelor project in a different direction. I wanted to try to develop an image recognition system based on ESP32, which is not exactly meant for image processing and analysis.The implementation of this project would lead to reduced expenses for developing an image recognition system. The ESP32-CAM board is much cheaper than a dedicated image processing board, such as the NVIDIA Jetson Nano.

Objectives

Therefore, the ideal scenario would be to collect images using the ESP32-CAM, to train a machine learning model that would learn to recognize three types of objects. The ESP32-CAM recognizes the object and sends the information to a Firebase database. Using an ESP8266, I retrieve the information from Firebase and depending on the recognized object, I control an SG90 servo motor to simulate sorting the recognized objects.

During the development of the project, I encountered several difficulties, I understand why ESP32-CAM is not the most optimal solution for an Image Recognition project. However, I manage to solve some of them and I can show a MVP of the initial idea.

Demo

Presentation slides

Architecture

Hardware

Components

AI Thinker ESP32-CAM - based on the ESP-32S module that integrates WIFI and Bluetooth
ESP8266 - low-cost Wi-Fi microchip, with built-in TCP/IP networking software, and microcontroller capability
SG90 - Tiny and lightweight with high output power.

Circuit Diagram

Software

Software arhitecture

Software Arhitecture have three parts:

Data acquisition and ML model training
Image Recognition and Firebase Population
Data acquisition from Firebase and decision making

Used methods Image Capturing

For video stream, I used the Eloquent library for ESP32CAM. The code used for streaming the image is part of the examples provided by the Eloquent team, example sketch that used is called 4_Video_Feed. In it, I made minor modifications so that I could connect to my personal Wi-Fi network. This code is loaded directly onto the ESP32CAM board.

For the process of acquisition, storage, processing, and training of the ML model, I used Python library everywhereml. To store pictures of the object we want to recognize, I first connected to the IP address of the ESP32CAM streaming the image. Then, I start the image collection process. I collected about 4000 pictures for 3 different classes. The classes defined by me are: 'background', which is the image of the empty background (without recognizable objects), 'alenka', which is a type of candy, and 'menthol', which is another type of candy. After the images were collected and grouped in folders with specific names, we can proceed to image processing. To optimize the training time of the ML model, the images were converted into black and white gradient, and their resolution was reduced to 40×30 pixels.

The transformed images can be used in training the ML model, we will train a RandomForest Classifier.

Note: The process of transformation and training of the classifier is a process that takes quite a lot of time.

After training the classifier, we convert the trained model into C++ libraries that can be used in the Arduino IDE and then upload them to the ESP32CAM.

After converting the code of the trained classifier into libraries compatible with the Arduino IDE, we can proceed to program the ESP32CAM for the actual image recognition. I created a sketch that connects to the personal Wi-Fi network, a Firebase database.After ESP32CAM camera is also initialized, the image recognition process takes place in the loop() area. The name of the recognized object is passed to the Firebase database. From here, this information can be used by the user for different purposes.

One scenario I applied was extracting information about the object recognized by the ESP8266 board, which has a decision node that, depending on the response received from Firebase, controls a servo motor that hypothetically can be used in a sorting process of the recognized objects.

Code & Structure

Arduino

4_Video_Feed.io - stream the video feed.

 #include "esp32cam.h"
#include "esp32cam/http/LiveFeed.h"


#define WIFI_SSID ""
#define WIFI_PASS ""
Eloquent::Esp32cam::Cam cam;
Eloquent::Esp32cam::Http::LiveFeed feed(cam, 80);


void setup() {
    Serial.begin(115200);
    delay(3000);
    Serial.println("Init");
    cam.aithinker();
    cam.highQuality();
    cam.qvga();

    while (!cam.begin())
        Serial.println(cam.getErrorMessage());

    // Connect to WiFi
    // If something goes wrong, print the error message
    while (!cam.connect(WIFI_SSID, WIFI_PASS))
        Serial.println(cam.getErrorMessage());

    //Initialize live feed http server
    // If something goes wrong, print the error message
    while (!feed.begin())
        Serial.println(feed.getErrorMessage());

    // make the camera accessible at http://esp32cam.local
    if (!cam.viewAt("esp32cam"))
        Serial.println("Cannot create alias, use the IP address");
    else
        Serial.println("Live Feed available at http://esp32cam.local");

    // display the IP address of the camera
    Serial.println(feed.getWelcomeMessage());
}


void loop() {
}

ImgRec_esp32.io - take care about image recognition process and send data to Firebase.

#include "Arduino.h"
#include "eloquent.h"
#include "eloquent/print.h"
#include "eloquent/tinyml/voting/quorum.h"
#include "eloquent/vision/camera/aithinker.h"
#include "HogPipeline.h"
#include "HogClassifier.h"
#include "Firebase_ESP_Client.h"
 
//Provide the token generation process info.
#include "addons/TokenHelper.h"
#include "WiFi.h"
//Define Firebase Data object
FirebaseData fbdo;
 
FirebaseAuth auth;
FirebaseConfig config;

//for setting server
#include "esp32cam.h"
#define WIFI_SSID "DIGI-x9kS"
#define WIFI_PASS "FkPVr3hT"


Eloquent::TinyML::Voting::Quorum<7> quorum;
String header;
String predictionLabel;

unsigned long sendDataPrevMillis = 0;
int count = 0;
bool signupOK = false;

void setup() {
  Serial.begin(115200);
  delay(3000);
  Serial.println("Begin");

Serial.print("Connecting to ");
  Serial.println(WIFI_SSID);
   //connect to wifi
  WiFi.begin(WIFI_SSID, WIFI_PASS);
  while(WiFi.status() != WL_CONNECTED){
    delay(500);
    Serial.print("+");
  }
  Serial.println("");
  Serial.println("Connected to WiFi");
  Serial.println("IP address:");
  Serial.println(WiFi.localIP());
/*-------------------------------------- */
   /* Assign the api key (required) */
  config.api_key = "AIzaSyDdfKjkcXJSKMqDhL9y4lJ37kuIoLcBwjI";
 
  /* Assign the RTDB URL (required) */
  config.database_url = "https://imagerecognition-d21b2-default-rtdb.europe-west1.firebasedatabase.app/";

   /* Sign up */
  if (Firebase.signUp(&config, &auth, "", "")){
    Serial.println("ok");
    signupOK = true;
  }
  else{
    Serial.printf("%s\n", config.signer.signupError.message.c_str());
  }
  //start the server
  //server.begin();
  
   /* Assign the callback function for the long running token generation task */
  config.token_status_callback = tokenStatusCallback; //see addons/TokenHelper.h
 
  Firebase.begin(&config, &auth);
  Firebase.reconnectWiFi(true);
  /*------------------------------------- */

  camera.qqvga();
  camera.grayscale();

  while (!camera.begin())
    Serial.println("Cannot init camera"); 
}

void loop() {
 
 

 if (Firebase.ready() && signupOK && (millis() - sendDataPrevMillis > 400 || sendDataPrevMillis == 0)){
    sendDataPrevMillis = millis();

   if (!camera.capture()) {
      Serial.println(camera.getErrorMessage());
      delay(1000);
      return;
  }
  // apply HOG pipeline to camera frame
  hog.transform(camera.buffer);

  // get a stable prediction
  uint8_t prediction = classifier.predict(hog.features);
  int8_t stablePrediction = quorum.vote(prediction);

  if (quorum.isStable()) {
      predictionLabel = classifier.getLabelOf(stablePrediction);
      Serial.println("Stable prediction: " + predictionLabel); 
  }  
  camera.free();

   // Write an String number on the database path test/int
    if (Firebase.RTDB.setString(&fbdo, "test/String", predictionLabel)){
      Serial.println("PASSED");
      Serial.println("PATH: " + fbdo.dataPath());
      Serial.println("TYPE: " + fbdo.dataType());
    }
    else {
      Serial.println("FAILED");
      Serial.println("REASON: " + fbdo.errorReason());
    }
 }
}

esp86_receiver.io - get data from Firebase and control the servo motor.

#include "Arduino.h"
#include "Firebase_ESP_Client.h"

#include "Servo.h" // servo library  
Servo s1;  

//Provide the token generation process info.
#include "addons/TokenHelper.h"
 
// Insert your network credentials
#define WIFI_SSID "DIGI-x9kS"
#define WIFI_PASSWORD "FkPVr3hT"
 
// Insert Firebase project API Key
#define API_KEY "AIzaSyDdfKjkcXJSKMqDhL9y4lJ37kuIoLcBwjI"
 
// Insert RTDB URLefine the RTDB URL */
#define DATABASE_URL "https://imagerecognition-d21b2-default-rtdb.europe-west1.firebasedatabase.app/" 
 
//Define Firebase Data object
FirebaseData fbdo;
 
FirebaseAuth auth;
FirebaseConfig config;
 
unsigned long sendDataPrevMillis = 0;
String stringValue;
bool signupOK = false;
 
void setup() {
  Serial.begin(115200);
  WiFi.begin(WIFI_SSID, WIFI_PASSWORD);
  Serial.print("Connecting to Wi-Fi");
  while (WiFi.status() != WL_CONNECTED) {
    Serial.print(".");
    delay(300);
  }
  Serial.println();
  Serial.print("Connected with IP: ");
  Serial.println(WiFi.localIP());
  Serial.println();
 
  /* Assign the api key (required) */
  config.api_key = API_KEY;
 
  /* Assign the RTDB URL (required) */
  config.database_url = DATABASE_URL;
 
  /* Sign up */
  if (Firebase.signUp(&config, &auth, "", "")) {
    Serial.println("ok");
    signupOK = true;
  }
  else {
    Serial.printf("%s\n", config.signer.signupError.message.c_str());
  }
 
  /* Assign the callback function for the long running token generation task */
  config.token_status_callback = tokenStatusCallback; //see addons/TokenHelper.h
 
  Firebase.begin(&config, &auth);
  Firebase.reconnectWiFi(true);
 // Serial.println("Leaving Setup");

 s1.attach(0);  // servo attach D3 pin of arduino  
}

void loop() {
    //Serial.println("Enter in Loop");

  if (Firebase.ready() && signupOK && (millis() - sendDataPrevMillis > 400 || sendDataPrevMillis == 0)) {
      //Serial.println("Enter in IF stattement of Firebase");
    sendDataPrevMillis = millis();
    if (Firebase.RTDB.getString(&fbdo, "/test/String")) {
      if (fbdo.dataType() == "string") {
        stringValue = fbdo.stringData();
        Serial.println(stringValue);
        if ( stringValue == "alenka")
            {
              s1.write(90);
              }
        else
        {
          s1.write(0);
          }
      }
    }
    else {
      Serial.println(fbdo.errorReason());
    }     
  }
}

Python

Image acquisition

#Collect images from Esp32-cam web server
from logging import basicConfig, INFO
from everywhereml.data import ImageDataset
from everywhereml.data.collect import MjpegCollector

base_folder = 'IOT_Captures_web'
IP_ADDRESS_OF_ESP = 'http://192.168.101.35:81'
basicConfig(level=INFO)

try:
  
    image_dataset = ImageDataset.from_nested_folders(
        name='Candies',  
        base_folder=base_folder
    )
except FileNotFoundError:
  
    mjpeg_collector = MjpegCollector(address=IP_ADDRESS_OF_ESP)
    image_dataset = mjpeg_collector.collect_many_classes(
        dataset_name='Candies', 
        base_folder=base_folder,
        duration=30
    )
  
print(image_dataset)

Image transformation

from test import image_dataset
from everywhereml.preprocessing.image.object_detection import HogPipeline
from everywhereml.preprocessing.image.transform import Resize

image_dataset = image_dataset.gray().uint8()

pipeline = HogPipeline(
    transforms=[
        Resize(width=40, height=30)
    ]
)

# Convert images to feature vectors
feature_dataset = pipeline.fit_transform(image_dataset)
feature_dataset.describe()

Train the Image Recognition model

from everywhereml.sklearn.ensemble import RandomForestClassifier

import pipeline as pipeline
from pipeline import feature_dataset

for i in range(10):
    clf = RandomForestClassifier(n_estimators=500, max_depth=5)

    # fit on train split and get accuracy on the test split
    train, test = feature_dataset.split(test_size=0.4, random_state=i)
    clf.fit(train)

    print('Score on test set: %.2f' % clf.score(test))

clf.fit(feature_dataset)

Transform model in C++ code library

from pipeline import pipeline
from pipeline import feature_dataset

print(pipeline.to_arduino_file(
    filename=r'C:\Users\xiaomi\Desktop\Master an 2\IOT\Proiect_IOT\HogPipeline.h',
    instance_name='hog'
))


from ML_train import clf

print(clf.to_arduino_file(
    filename=r'C:\Users\xiaomi\Desktop\Master an 2\IOT\Proiect_IOT\HogClassifier.h',
    instance_name='classifier', 
    class_map=feature_dataset.class_map
))

Results

The results of implementing the project are contradictory. On one hand, I was able to implement an Image Recognition model on an ESP32 that works. The downside is that the image recognition process is very unstable, and with a slight deviation in background, lighting, or image capture perspective, the model fails.

To solve these problems, I decided to increase the number of images to train the model. Although training the model took much longer than usual, when running the program on the board, the ESP32CAM camera refused to initialize, and recognition could not be performed. Another attempt to improve the model was to increase the image quality. Thus, I took the pictures in the dataset with my phone, and based on them tried to train the model. Again, the training process took a very long time and in the end, the same problem occurred: the camera failed to get past the initialization point, resulting in a forced reboot of the board.

In the end, I decided to improve the model by increasing the number of estimations in the classifier training process. This increased the accuracy of recognition, but unfortunately, the system's sensitivity to other factors remained the same.

Conclusion

In conclusion, I can say that the ESP32 family can be used in the process of Image Recognition, but it is important to consider that ESP32 boards were not created for image processing. Although the model training process does not take place directly on the board, the image recognition process can be unstable. I would recommend ESP32 as a tool for Image Recognition in cases where a cheap and compact solution is needed, which aims to recognize objects that can be easily differentiated.