Pet recognition and monitoring system


The idea for this project came when other neighborhood cats started eating the food of my outdoor cat. Although it would be nice to feed all of them, it'd not be a good idea, so a solution was needed in order to identify my cat and track it's behavior. This way, I could find the time it prefers to eat, and for further development could configure the feeder to open whenever my cat is detected


The purpose of this project is to monitor specific behavior of your house pets. ESP32-CAM boards with cameras and motion detectors will be placed near areas of interest such as the pet feeder and will send a picture to a firebase server whenever the motion sensor is triggered. The server will use a ML model to recognize your specific pets and save specific data to a database, which can be used to track the pet's behavior.


  • AI Thinker ESP32-CAM - based on the ESP-32S module that integrates WIFI and Bluetooth
  • OV2640 Camera - supports jpeg image output format
  • PIR HC-SR501 Sensor - motion sensor based on the Passive Infrared technology
  • FT232RL USB to Serial Converter - for programming the ESP32-CAM

Most of the difficulty on the hardware side comes from programming the esp32, which does not have a USB port. The PIR sensor only has to be connected to a power source and it's output to the GPIO13 of the esp32.

Software and services

  • Firebase
    • Cloud functions - Used as a HTTP server
    • Realtime database - For saving timestamps regarding pet detection
    • Firebase Hosting - Simple Web interface for monitoring pet behavior and trying different ML models
  • Google Teachable Machine - Classifier generator from Google that outputs Tensorflow models
  • TensorflowJS - Tensorflow library for Javascript, will be used in the Cloud functions
  • Arduino IDE alongside the required libraries for ESP32, HTTP client, WiFi, Firebase client and video camera


Shortly, the architecture has classic client-server structure, where the esp32-cam is the http client and firebase the server. The other software and harware components are built on top of these two.

Application setup

  • After programming the ESP32 and deploying the firebase code, the user will need to upload a TensorflowJS model in order to detect their house pets.
  • Google's Teachable Machine provides an easy way to generate classifiers for an average user. All the end user has to do is to provide pictures and select the desired cathegory
  • Google authomatically stores the model and provides a link to it. After the model has been generated, the user should insert the link in the Web interface that we provide with firebase hosting.

How it works

  • We will now refer to the ESP32-CAM as the HTTP client and to the cloud functions as the server.
  • The client will permanently check the input pin from the motion detector. Once motion is detected, a picture is taken and sent to the server.
  • The server will load the model, if it's not already loaded, make a prediction on the received image and send the result back to the client.
  • If the HTTP exchange was successful, the client will send the prediction result alongside a timestamp in the UNIX format.

How the Web interface works

  • The web interface is completely based on firebase hosting.
  • The main page will present a textbox in order to provide a link to a new model and a playground area where you can drag and drop pictures in order to test the model.
  • It will also provide an interface for visualizing the data stored in the realtime database by the esp32.

Software architecure - ESP32

As previously described, the actions taken by the ESP32 are quite simple, and described in the following flowchart. The PIR sensor does most of the detection work, and we simply read GPIO13 in order to see if the motion was detected. If motion is detected, then we take a picture and send to to the server. If the exchange is successful and receive the HTTP OK code, we send the prediction result alongside a timestamp to the realtimedb

Software architecure - Firebase

On the server side of things, we will have two cloud functions, a realtime database and, through firebase hosting a web interface

  • Cloud functions: /predictImage
    • This function will wait for a request containing an image and respond with the predicted content of that image.
  • Cloud functions: /changeModel
    • This function will wait for a request containing a link to a new model, and will save that link to a file, from which it will be read by the /predictImage function when loading the model. It will also invalidate the currently loaded model, so that it will be reloaded.
  • Realtime Database: The only purpose of the database is to store the data sent by the esp32, in the format /cats/{timestamp}/{prediction}. The stored information can then be used by the user for whatever purpose, for example to display it in the web interface.
  • Firebase Hosting: This will be used for our web interface that will provide the user with an easy way to test new models, upload new models to the cloud functions and visualize data in the realtime database.

Challenges and things to improve

  • Finding an easy way for a user create their own machine learning model, with minimal knowledge required. The main idea of the system was that all the user has to do, is to provide pictures of their house cats. I have tried multiple free models with varying degrees of success, but none as good as Google's Teachable Machine, with provided a nice web interface to generate classifiers and export them as tensorflow models.
  • Testing was, and still is, hard to do without an actual cat. I have tested the model with the web interface by uploading pictures already taken, but most importantly I had to test the predictions with pictures from esp32 camera. What I did was to put my phone, which displayed a picture of a cat and trigger the motion sensor. Since the camera has quite low resolution, combined with the fact that the cat was presented on a not-so-good screen, the predictions of the model were less accurate.
  • The algorithm for cat detection used in the esp32 could still be improved. For example the esp may take multiple pictures when the motion sensor is triggered, in case the cat is not detected properly even if it was there. Also, it would be nice to provide our own classifier model and web interface, and not be reliant on Teachable Machine.
  • The project is also dependent on a stable WiFi connection. It would be nice if the esp32 saved the pictures in it's ram or on persistent storage and send them whenever a connection is available. Or, even better, use a SIM card for internet connectivity so the hardware could easily work outdoors.
  • The board could also connect to the internet only periodically in order to conserve energy.


This project provides an easy way to control and monitor the behavior of your house pets. Although here still is room for improvement the ease of setting up the project, and on the finesse of the detection algorithm, it provides a good basis for further development. All of the basic requirements are implemented and quite useful, requiring just a bit more polishing. In summary, this project integrates robust web technologies and relatively cheap hardware in order to fulfill an useful task.


iothings/proiecte/2023/petrecognitionandmonitoring.txt · Last modified: 2024/01/12 17:25 by nicu.loghinescu
CC Attribution-Share Alike 3.0 Unported Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0