Unsupervised learning is a type of machine learning where models work with data that hasn’t been labeled. Instead of learning from examples with known outcomes, the algorithm explores the structure of the data on its own, often identifying patterns, clusters, or anomalies.
In embedded development, unsupervised learning is becoming increasingly relevant. Devices like microcontrollers or edge AI chips can use these techniques to make smart, local decisions without needing constant cloud access or pre-labeled datasets.
Embedded systems often operate in dynamic environments—factories, vehicles, wearables—where conditions can shift over time. It's not always feasible to collect labeled data for every possible situation. Unsupervised learning offers a way to adapt by:
By running these models directly on embedded hardware, systems can respond in real-time, save bandwidth, and improve privacy by keeping data local.
Clustering is one of the most common unsupervised learning techniques. It involves organizing data into groups—or clusters—based on similarity. In embedded systems, this can help in:
For example, a smart thermostat might cluster temperature and humidity data to recognize different types of weather patterns, without being explicitly told what those patterns are.
Algorithms like k-means, hierarchical clustering, or even lightweight neural networks can be adapted to run efficiently on embedded platforms, especially when paired with dimensionality reduction or data quantization.
Anomaly detection is another valuable application of unsupervised learning. The idea is to learn what “normal” looks like, and then flag data that deviates from that baseline. This is especially useful in embedded systems for:
Since anomalies are rare and often undefined ahead of time, unsupervised methods can help by learning patterns from normal operation and identifying outliers on the fly.
Techniques like autoencoders, clustering-based thresholds, or statistical models are commonly used—and with optimizations, many of these can run on low-power edge devices.
Running unsupervised learning on embedded systems isn’t without trade-offs. Developers must consider:
Frameworks like TensorFlow Lite for Microcontrollers, TinyML, and Edge Impulse are making it easier to deploy models that are small, fast, and energy-efficient.
K-means clustering is an unsupervised learning technique that groups data points into clusters based on how similar they are to one another. Unlike supervised methods that rely on labeled examples, K-means works with unlabeled data to reveal underlying patterns or structure. For instance, an wearable device might use K-means to discern between different activities, such as sleeping, walking or running.
IMU Activity Clustering (K-means from scratch) uses the LSM6DSL accelerometer/gyroscope to extract per-window features—mean, variance, energy, and spectral centroid from FFT magnitude bands—and feeds them into a K-means model (K=3–5) with k-means++ initialization, optionally supporting streaming centroid updates via a small learning rate. The system clusters unlabeled motion into “idle,” “walking,” and “shaking,” then displays the cluster ID on the OLED and sets the NeoPixel color by cluster. It’s a pure unsupervised pipeline from raw time-series data that lets you directly see how feature choices influence the resulting clusters.
We're going to need this platformio.ini file:
[env:esp32c6_sparrow] ; Use the pioarduino platform (Arduino-ESP32 3.x w/ C6 support) platform = https://github.com/pioarduino/platform-espressif32/releases/download/54.03.20/platform-espressif32.zip board = esp32-c6-devkitm-1 framework = arduino ; Faster serial + some optimization monitor_speed = 115200 build_flags = -O2 -DARDUINO_USB_MODE=1 -DARDUINO_USB_CDC_ON_BOOT=1 ; Libraries lib_deps = https://github.com/dycodex/LSM6DSL-Arduino.git arduinoFFT @ ^1.6.1 adafruit/Adafruit NeoPixel @ ^1.12.3
Build and upload the code example here.
The code samples the IMU at ~104 Hz (2-second windows), builds a 4-D feature vector per window and stores ~40 windows to bootstrap. It then runs k-means++ init + batch k-means, then switches to streaming inference with a gentle centroid update. After the training period is completed, the code prints features/cluster over serial and sets the NeoPixel color by cluster (0=red, 1=green, 2=blue).
This ML example turns the ESP32 into a real-time audio anomaly detector. It learns what “normal” audio sounds like for a few seconds, then continuously flags frames that look different. The NeoPixel shows:
Green = audio looks normal Red = anomaly (outlier)
The code goes through these steps:
threshold = dist_median + ANOM_MULT × dist_mad (with ANOM_MULT = 3.0)In order to make it run properly:
If it’s too sensitive or not sensitive enough, tweak:
ANOM_MULT (e.g., 2.5–4.0),K_CLUSTERS (3–6 often fine),FFT size (512/1024) or N_BANDS (8–24).This example turns an ESP32-C6 “Sparrow” into a light-state recognizer.
Each loop tick runs every 100 ms to achieve 10 Hz sampling. The raw brightness is compressed to a [0,1] scale with a log normalization that treats 64k as a generous upper bound, then smoothed with an exponential moving average using α=0.3. The code also maintains a short ring buffer covering roughly three seconds of these smoothed values to estimate a windowed standard deviation, which acts as a quick measure of short-term variability.
Unsupervised clustering is done online with a very small, one-dimensional k-means-like scheme. Five cluster centroids are seeded across the [0,1] range, and for about forty seconds (TRAIN_SAMPLES=400 at 10 Hz) the system is in a training phase: each new smoothed sample is assigned to its nearest cluster by absolute distance and that cluster’s running mean and variance are updated. During training it prints progress and the evolving means. After training, each new sample is again assigned to its nearest cluster to produce a cluster index, along with the cluster’s standard deviation and the short-window deviation from the ring buffer. A human-readable room state is then chosen. If the user has previously assigned a manual label to that cluster via a tiny serial REPL, that label is used; otherwise a heuristic converts normalized level and short-term variability into categories like “night”, “full_sun”, “lights_on”, “shade/day_indirect”, or “transition”. To avoid label flicker, a simple hysteresis keeps a new label “pending” until it appears twice in a row. The cluster means continue to adapt slowly during inference so the model can track gradual daylight changes.
Labels can be managed over Serial with commands such as setlabel <k> <name>, savelabels, and labels. The Preferences API persists these names in NVS under a “labels” namespace so they survive reboots. Throughout, the program reports the sensor type, the raw brightness, the normalized and EMA values, the chosen cluster and its statistics, the short-window deviation, and the stable label, making it easy to tune thresholds or swap sensors without changing application logic.
Download the code here.
Add these two lines to your lib_deps in platformio.ini:
https://github.com/DFRobot/DFRobot_LTR308.git adafruit/Adafruit LTR329 and LTR303@^2.0.1