Alexandru-Iulian Constantinescu - ACES
aconstantinescu1108@stud.etti.upb.ro
The purpose of this project is to detect the presence of bears in areas populated by humans and notify one or more people. In order to achieve this an ultrasonic sensor will measure periodically the distance to the nearest object. If the distance is small enough, it means something came in front of the sensor and a camera will take a photo of it. Afterwards, a neural network will analyze if in the photo there is a bear of not. If there is a bear, the photo of the bear will be uploaded via WiFi on a Firebase server and emailed to one ore more people.
For this project I used:
For uploading the code to the board a configurations similar to this is necessary:
The FTDI-USB adapter (depicted above in red) is necessary as the ESP32-CAM module has no port for connecting directly a computer to receive data. Through it the module will also receive power. The VCC and GND pins can be linked directly between ESP32-CAM and FTDI232 but I chose to use a small breadboard between them, to be able to also power the ultrasonic sensor. On the FTDI232 there is also a jumper, which is used for selecting the voltage between 3.3V and 5V. I used 5V as recommanded by the online community.
Also, the pin GPIO0 from ESP32-CAM needs to be put to ground in order for the module to be in flash mode.
For normal functioning mode, the jumper between GPIO0 and GND needs to be removed from ESP32-CAM, and the board restarted via the reset button on it afterwards.
As the board and the sensor need only to be supplyed with power, the configuration above can also be changed to something similar to this, to be more portable (with maybe a resistor between the battery and the sensor and board, if the voltage provided by the battery is greater than 5V):
The flowchart of the software functionality is the following:
Because the part with computing the distance is quite simple (I use the formula distance=(duration/2)*0.0343 after I get the duration from a function call) I will talk a bit more in depth about the other parts of the software implementation:
As a neural network I built a binary classification model.
For training I made a dataset of 448 hand-picked bear images from Flickr (External Link and External Link). For the other class I picked 465 images with humans from cctv cameras, dogs and cats, as they are the most likely to be detected instead of bears. I got those images from kaggle datasets (External Link, External Link and External Link). I also used 38 images for the validation set.
Because the dataset I made is quite small I used Albumentations library for image augmentation.
The final version I have is from here: External Link, with small changes considering I have a different dataset. This model is MobileNetV1 with transfer learning, in order to be a binary classifier.
I also tried a model based on MobileNetV3Small and a custom model with few layers. Those proved ineffective as after converting to Tensorflow Lite their size was too big for the ESP32-CAM, especially the custom model.
After training the model for 30 epochs I got an accuracy of about 87%:
This lightweight model had only 2.89 MB after training and 302 KB after being converted to TfLite, but 1.9MB after being converted again to an array which I put and used on the board.
By increasing the input size of the model, the model itself doesn't have a big increase in size, but the space needed for tensors increases considerably. Also, additional layers might need to be added to the model, to downsample the bigger input size.
After getting the converted tensorflow model as an .tflite file I converted it to a header using the command:
xxd -i model.tflite > model.h
Afterwards I uploaded the header from the ArduinoIDE sketch.
For storing the bear images online I chose Firebase because it's free, easy to use and to setup.
In the code for uploading photos to Firebase you need an API-Key which is unique to the project, user email and password, and a Storage Bucket ID, which is also unique to the project. The website where the images can be seen looks like this:
The code for uploading to Firebase I got from here: External Link
For sending a new image via email I choosed using a SMTP server from gmail, following this tutorial: External Link. This allows sending emails to multiple recipients at once and also is up to date, after the Gmail update from 30 May 2022 which changed the way of configuring a SMTP server via Gmail on an embedded device.
This functionality (email sending) can also be done quite easily, in about 40-45 lines of code in ArduinoIDE.
Overall the implementation of the system is not done. I made the neural network for detecting bears, and put it on the ESP32-CAM. The ultrasonic sensor also works well. The problems I am currently facing are connecting to the Firebase and to the SMTP server. I was able to connect to Firebase with the current ESP32-CAM and upload images unitil recently, as can be seen in the image from chapter 3.2. One challenge I encountered on working on this project were the faulty ESP32-CAMs (one arrived broken and 2 became broken after a few hours of use). As I can't see exactly the images taken is also hard to adapt the neural network for correct inference on the board.
Besides the links from above, I would like to mention: https://www.tensorflow.org/lite/microcontrollers/get_started_low_level
https://www.youtube.com/watch?v=Qu-1RK4Fk7g&ab_channel=AndreasSpiess
https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/examples/person_detection
https://randomnerdtutorials.com/esp32-cam-ov2640-camera-settings/