This is an old revision of the document!
Unsupervised learning is a type of machine learning where models work with data that hasn’t been labeled. Instead of learning from examples with known outcomes, the algorithm explores the structure of the data on its own, often identifying patterns, clusters, or anomalies.
In embedded development, unsupervised learning is becoming increasingly relevant. Devices like microcontrollers or edge AI chips can use these techniques to make smart, local decisions without needing constant cloud access or pre-labeled datasets.
Embedded systems often operate in dynamic environments—factories, vehicles, wearables—where conditions can shift over time. It's not always feasible to collect labeled data for every possible situation. Unsupervised learning offers a way to adapt by:
By running these models directly on embedded hardware, systems can respond in real-time, save bandwidth, and improve privacy by keeping data local.
Clustering is one of the most common unsupervised learning techniques. It involves organizing data into groups—or clusters—based on similarity. In embedded systems, this can help in:
For example, a smart thermostat might cluster temperature and humidity data to recognize different types of weather patterns, without being explicitly told what those patterns are.
Algorithms like k-means, hierarchical clustering, or even lightweight neural networks can be adapted to run efficiently on embedded platforms, especially when paired with dimensionality reduction or data quantization.
Anomaly detection is another valuable application of unsupervised learning. The idea is to learn what “normal” looks like, and then flag data that deviates from that baseline. This is especially useful in embedded systems for:
Since anomalies are rare and often undefined ahead of time, unsupervised methods can help by learning patterns from normal operation and identifying outliers on the fly.
Techniques like autoencoders, clustering-based thresholds, or statistical models are commonly used—and with optimizations, many of these can run on low-power edge devices.
Running unsupervised learning on embedded systems isn’t without trade-offs. Developers must consider:
Frameworks like TensorFlow Lite for Microcontrollers, TinyML, and Edge Impulse are making it easier to deploy models that are small, fast, and energy-efficient.
K-means clustering is an unsupervised learning technique that groups data points into clusters based on how similar they are to one another. Unlike supervised methods that rely on labeled examples, K-means works with unlabeled data to reveal underlying patterns or structure. For instance, an wearable device might use K-means to discern between different activities, such as sleeping, walkinr or running.
IMU Activity Clustering (K-means from scratch) uses the LSM6DSL accelerometer/gyroscope to extract per-window features—mean, variance, energy, and spectral centroid from FFT magnitude bands—and feeds them into a K-means model (K=3–5) with k-means++ initialization, optionally supporting streaming centroid updates via a small learning rate. The system clusters unlabeled motion into “idle,” “walking,” and “shaking,” then displays the cluster ID on the OLED and sets the NeoPixel color by cluster. It’s a pure unsupervised pipeline from raw time-series data that lets you directly see how feature choices influence the resulting clusters.
We're going to need this platformio.ini file:
[env:esp32c6_sparrow] ; Use the pioarduino platform (Arduino-ESP32 3.x w/ C6 support) platform = https://github.com/pioarduino/platform-espressif32/releases/download/54.03.20/platform-espressif32.zip board = esp32-c6-devkitm-1 framework = arduino ; Faster serial + some optimization monitor_speed = 115200 build_flags = -O2 -DARDUINO_USB_MODE=1 -DARDUINO_USB_CDC_ON_BOOT=1 ; Libraries lib_deps = https://github.com/dycodex/LSM6DSL-Arduino.git arduinoFFT @ ^1.6.1 adafruit/Adafruit NeoPixel @ ^1.12.3
Build and upload the code example here.
The code samples the IMU at ~104 Hz (2-second windows), builds a 4-D feature vector per window and stores ~40 windows to bootstrap. It then runs k-means++ init + batch k-means, then switches to streaming inference with a gentle centroid update. After the training period is completed, the code prints features/cluster over serial and sets the NeoPixel color by cluster (0=red, 1=green, 2=blue).