This is an old revision of the document!

Lab 7. Unsupervised Learning in IoT

Unsupervised learning is a type of machine learning where models work with data that hasn’t been labeled. Instead of learning from examples with known outcomes, the algorithm explores the structure of the data on its own, often identifying patterns, clusters, or anomalies.

In embedded development, unsupervised learning is becoming increasingly relevant. Devices like microcontrollers or edge AI chips can use these techniques to make smart, local decisions without needing constant cloud access or pre-labeled datasets.

Why Use Unsupervised Learning on Embedded Devices?

Embedded systems often operate in dynamic environments—factories, vehicles, wearables—where conditions can shift over time. It's not always feasible to collect labeled data for every possible situation. Unsupervised learning offers a way to adapt by:

Finding hidden patterns in sensor data
Grouping similar behaviors or signals
Detecting unusual or unexpected activity
Reducing the amount of data that needs to be transmitted or stored

By running these models directly on embedded hardware, systems can respond in real-time, save bandwidth, and improve privacy by keeping data local.

Clustering: Finding Natural Groupings in Data

Clustering is one of the most common unsupervised learning techniques. It involves organizing data into groups—or clusters—based on similarity. In embedded systems, this can help in:

Classifying motion patterns in a wearable device
Grouping environmental conditions from sensor arrays
Identifying usage patterns in connected appliances

For example, a smart thermostat might cluster temperature and humidity data to recognize different types of weather patterns, without being explicitly told what those patterns are.

Algorithms like k-means, hierarchical clustering, or even lightweight neural networks can be adapted to run efficiently on embedded platforms, especially when paired with dimensionality reduction or data quantization.

Anomaly Detection: Spotting the Unusual

Anomaly detection is another valuable application of unsupervised learning. The idea is to learn what “normal” looks like, and then flag data that deviates from that baseline. This is especially useful in embedded systems for:

Predictive maintenance (e.g., vibration monitoring in motors)
Security (e.g., detecting unauthorized access patterns)
Health monitoring (e.g., unusual heart rate or movement)

Since anomalies are rare and often undefined ahead of time, unsupervised methods can help by learning patterns from normal operation and identifying outliers on the fly.

Techniques like autoencoders, clustering-based thresholds, or statistical models are commonly used—and with optimizations, many of these can run on low-power edge devices.

Challenges and Considerations

Running unsupervised learning on embedded systems isn’t without trade-offs. Developers must consider:

Limited resources – RAM, CPU, and storage are often constrained
Real-time requirements – Models must be fast and responsive
Power consumption – Especially in battery-powered devices
Model size and complexity – Lightweight algorithms are preferred

Frameworks like TensorFlow Lite for Microcontrollers, TinyML, and Edge Impulse are making it easier to deploy models that are small, fast, and energy-efficient.

K-Means: IMU Activity Clustering

K-means clustering is an unsupervised learning technique that groups data points into clusters based on how similar they are to one another. Unlike supervised methods that rely on labeled examples, K-means works with unlabeled data to reveal underlying patterns or structure. For instance, an wearable device might use K-means to discern between different activities, such as sleeping, walkinr or running.

You can learn more about K-means Clustering here!

IMU Activity Clustering (K-means from scratch) uses the LSM6DSL accelerometer/gyroscope to extract per-window features—mean, variance, energy, and spectral centroid from FFT magnitude bands—and feeds them into a K-means model (K=3–5) with k-means++ initialization, optionally supporting streaming centroid updates via a small learning rate. The system clusters unlabeled motion into “idle,” “walking,” and “shaking,” then displays the cluster ID on the OLED and sets the NeoPixel color by cluster. It’s a pure unsupervised pipeline from raw time-series data that lets you directly see how feature choices influence the resulting clusters.

We're going to need this platformio.ini file:

platformio.ini

[env:esp32c6_sparrow]
; Use the pioarduino platform (Arduino-ESP32 3.x w/ C6 support)
platform = https://github.com/pioarduino/platform-espressif32/releases/download/54.03.20/platform-espressif32.zip
board = esp32-c6-devkitm-1
framework = arduino
 
; Faster serial + some optimization
monitor_speed = 115200
build_flags =
  -O2
  -DARDUINO_USB_MODE=1
  -DARDUINO_USB_CDC_ON_BOOT=1
 
; Libraries
lib_deps =
  https://github.com/dycodex/LSM6DSL-Arduino.git
  arduinoFFT @ ^1.6.1
  adafruit/Adafruit NeoPixel @ ^1.12.3

Build and upload the code example here.

The code samples the IMU at ~104 Hz (2-second windows), builds a 4-D feature vector per window and stores ~40 windows to bootstrap. It then runs k-means++ init + batch k-means, then switches to streaming inference with a gentle centroid update. After the training period is completed, the code prints features/cluster over serial and sets the NeoPixel color by cluster (0=red, 1=green, 2=blue).

To make this detect three different “activities” during the learning stage, keep your board stationary on the table for about 15 seconds, then shake the board horizontally for another 15 seconds, then shake the board up and down for another 15 seconds. This will automatically create three clusters of data. Follow the prompts on the serial terminal!