Even though IoT's popularity has grown significantly in recent years, the concept of connecting small devices to a network is not quite a novelty. Starting from the '80s, many data transfer protocol have been developed for connecting embedded devices, mainly for automatic control and industrial application: Modbus[1], BACnet, Optomux and other hundreds of protocols and variations. Those protocols were used to control both home appliances (like HVAC systems or security systems for buildings) and industrial equipment (PLC, robots, drivers and controllers for various equipments). Fifty years later, some of those protocols continue to be the de-facto standard in that specific technological area, but most of them are obsolete and have been replaced.
Nowadays, most of the IoT devices rely on the Internet infrastructure or at least are accessible from the Internet. Security for IoT networks is a hot topic which opens up new challenges: overhead implied by the security mechanisms can be managed by the limited resources and energy-efficient IoT nodes?
MQTT[2] is a loseless, bi-directional, connection-oriented protocol. It was first developed for industry appliances, but developed over time as a general-use IoT protocol. It has appliances in many industries: consumer products for smart home, robotics, telecom, automotive, etc.
There are 2 entities in a MQTT-defined network:
MQTT is a publish-subscribe protocol. Each client can listen (subscribe) or speak (publish) on a certain topic. The broker receives all the published messages and routes them to the appropriate destination clients. There are few types of messages that can be exchanged between clients and broker: connect, disconnect, subscribe, unsubscribe and ACKs. The interaction between clients and the broker is represented in Figure 1. This design is lightweight, simple to implement and scales up easily.
The fields inside a MQTT packets are[3]:
MQTT usually runs over TCP/IP, but it can be implemented on top of any network stack that guarantees lossless and ordered transport.
The set-up is depicted in Figure 2 and represents a minimal implementation for an IoT network. The ESP32 Wroom is the only client in the network, connected to the LAN over the 802.11 WiFi infrastructure. The broker runs on a machine connected to the LAN over the Ethernet. The LAN is connected to the Internet and there are no in-bound rules defined to limit MQTT connections from outside the local network.
Mosquitto[4] is an open source MQTT broker, suitable for both research and production purposes. Although the MQTT protocol is simple and straightforward, the broker may provide lots of configuration options to control the interaction with its clients. In order to run Mosquitto, following packages should be installed on the broker machine (apt is the package manager of choice here):
$ sudo apt update $ sudo apt install mosquitto mosquitto-clients
The default configuration file for Mosquitto is located at /etc/mosquitto/mosquitto.conf
. Custom configuration files can be defined in /etc/mosquitto/conf.d/
. By default, Mosquitto service runs on 1883, with the default configuration, but you can start as more Mosquitto daemons simultaneously, on different ports and with custom configurations with:
$ sudo mosquitto -p port_number -c /etc/mosquitto/conf.d/example.conf
To simulate clients, use mosquitto_sub
and mosquitto_pub
, in order to publish and listen on topics:
$ mosquitto_sub -p 1883 -t "TEST_TOPIC" $ mosquitto_pub -m "This is a message" -p 1883 -t "TEST_TOPIC"
The network ports must be open on the machine and the mosquitto daemons should be listening on the specific address range and port that are specified in the configuration file:
$ sudo ufw allow 1883 $ sudo netstat -nlp | grep mosquitto
In this scenario, everyone from the Internet can connect to the MQTT broker, based on the IP address of the LAN Gateway and the port number of the Mosquitto service. Port forwarding should be configured on the router to forward all packages to the broker machine, otherwise MQTT packages may be dropped. An attacker can connect to the broker and publish or subscribe to the running topics. As long as the attacker is connected to the broker, it doesn't matter if other nodes secure their communication to the broker, as all data flowing through the topics will be visible to the attacker. Also, in this scenario, the broker could be flooded with new connections from fake clients.
MQTT brokers can be configured to allow connections only from clients providing known credentials[5]. On the client side, the username and password are stored in clear. On the server side are stored the allowed usernames and the corresponding hash values for the passwords. The broker will receive SYN requests from everyone, but will accept only the clients with allowed credentials. The downside of this method, as exemplified in the Figure 3, is that that both the payload (eg. values from a sensor) and even the password in the initial SYN packet are sent over the network in clear text, so an attacker can leak this information and use the credentials to connect to the broker with its clients.
AES[6] is a standard for symmetric encryption. The design principle of AES is known as a substitution and permutation network. For both encryption and decryption operations, a matrix is built with bytes of the payload and key strings, then lines of the matrix are substituted, shifted and mixed in a certain order, for more rounds.
CBC[7] (Cipher Block Chaining) is a method for encryption and decryption of large strings. The payload is split in smaller chunks, called blocks, and AES is used to encrypt each block at a time. The resulted ciphertext is further used for encryption of the following blocks, by xor-ing with the next plaintext block. An initialization vector (IV) is a true random string needed for the encryption of the first block. Each block (IV, plain text and cipher) must have the same size. Before the AES CBC encryption, the client and the server side must negotiate a private key and an IV, this being outside the scope of AES. ESP32 Wroom has integrated hardware acceleration support[8] for some cryptographic operations: AES, RSA, SHA hashing and RNG (Pseudo Random Number Generation).
This security level aims to encrypt the actual payload sent over MQTT, not the payload field from the MQTT package. This payload may represent a numerical value read from a sensor or any other piece of information that a client would like to send or receive to/from the broker. The encryption and decryption are done at the application level, so MQTT and its underlying protocols are not altered. In this scenario, an attacker can still execute TCP and MQTT-specific attacks: change packet order, drop packets, flood MQTT server, alterate (but not read in clear) payloads, change MQTT flags, leak topic names, etc.
TLS(Transport Layer Security)[9] is a protocol that runs on top of TCP (OSI 4th layer security). TLS connections are secured with a symmetric key pair, named certificate. Attackers cannot decrypt the content of the messages, not even if they assist the security negotiations (MITM attacks). The connection is guaranteed to be reliable, as TCP packets can't be dropped or altered by the attacker.
The Arduino library WiFiClientSecure.h
[10] implements TLS for ESP32 platform. To generate a valid TLS certificate with openssl[11] toolkit:
Generate a key pair for the CA:
openssl genrsa -des3 -out ca.key 2048
Create a certificate for the CA using the CA key previously generated:
openssl req -new -x509 -days 1826 -key ca.key -out ca.crt
Create the server key pair that will be used by the broker:
openssl genrsa -out server.key 2048
Create a certificate request (.csr), be careful to use the broker's IP or hostname for the common name field:
openssl req -new -out server.csr -key server.key
Use the CA key to verify and sign the server certificate:
openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key -CAcreateserial -out server.crt -days 360
Copy the ca keypair, the signed certificate and the server-side key to the /etc/mosquitto/ca_certificates
and configure Mosquitto to use TLS for connections:
$ cat /etc/mosquitto/conf.d/tls_security.conf listener 1884 0.0.0.0 allow_anonymous true cafile /etc/mosquitto/ca_certificates/ca.crt keyfile /etc/mosquitto/ca_certificates/server.key certfile /etc/mosquitto/ca_certificates/server.crt tls_version tlsv1.2
Finally, copy the CA key on the client. When the client attempts to connect, the server presents a certificate (server.crt) to the device. The server's domain name or IP address is encoded in this certificate. The CA key stored on the device (ca.crt) is used to sign the server's certificate. If the signature is valid, then the server's certificate is valid. The TLS protocol ensures that this domain name in the certificate matches the domain name of the server, otherwise the connection is terminated. This process is illustrated bellow.
As seen in the packet analysis in Figure 7, the MQTT packets are not visible anymore, being encapsulated and encrypted by TLS. The protocol specific secured key exchange and handshake procedures are also visible in the capture.
In this section, the performance overhead implied by the security levels described above is measured for the different security levels described above. L0 implementation is used as a standard value, as there is no encryption implemented, only the wireless TX is consuming time. L1 is not tested, as the authentication procedure is run only one time per connection and this is considered negligible overhead. L2 and L3 use encryption method and the performance overhead is visible.
Measuring the execution time is a valid method to estimate the performance for each implementation, as the core frequency of the MCU (Xtensa LX-6) is constantly at 240MHz and there are not any other tasks running in parallel. For this tests, in order to measure relevant results, the unused peripherals and interruptions have not been deactivated, but also were not used. Some peripherals may generate huge time delays: for example, sending over UART is 20 times slower than using the wireless interface.
In 802.11 wireless networks, there are many scenarios when frames (Layer 2 MAC Frames - Medium Access Control) could be retransmitted for an undefined number of retries[12], which is a time-consuming operation and may severely alter the measurements. For this experiment the total number of MAC retries was not measured, but to keep retires down and constant:
For each test run, the client connects to the wireless network and to the MQTT broker (negligible overhead), then starts to transmit a number of TOTAL_MESSAGES messages, each of size MESSAGE_SIZE = CHUNK_SIZE * CHUNK_COUNT. The default CHUNK_SIZE is 32 bytes. Before transmitting each message, L2 also runs AES-CBC encryption. For L3, the encryption is done at TLS level. Only CHUNK_COUNT and TOTAL_MESSAGES parameters are modified:
CHUNK_COUNT | #250 | #500 | #1000 | #2500 | #5000 |
MESSAGE_SIZE | 8KB | 16KB | 32KB | 80KB | 160KB |
For the maximum MESSAGE_SIZE of 160KB, the total encrypted bytes (for L2 and L3) and then transmitted during the experiment is:
TOTAL_MESSAGES | #100 | #500 | #1000 | #2500 | #10000 |
TOTAL_TX_SIZE | 40MB | 80MB | 160MB | 400MB | 1.6GB |
Notice the TOTAL_TX_SIZE is pretty big for generic WSN nodes, which usually transmit messages of a few bytes long, but the message size was over-sized in order to test the encryption model performance.
In Figure 8, Figure 9 and Figure 10 are plotted the measured values for all experiments. On the OX axis is represented the size of one message (MESSAGE_SIZE) in KBytes, which is relevant for the encryption algorithm performance and memory load. On the OY axis is represented the total duration of the experiment in milliseconds.
As expected, L0 implementation has the least overhead because no encryption is involved. For low MESSAGE_SIZE values, the overhead of L2 and L3 compared to L0 is 200%. For bigger MESSAGE_SIZE values, L2 has almost 300% overhead and L3 only 150% overhead compared to L0. As more messages (TOTAL_MESSAGES) are exchanged during one connection, L2 is more dependable on MESSAGE_SIZE than L3. This may be because the input plaintext length for each encryption process is different for L2 and L3. Maximum TCP packet size is 64KB, but data link layer protocols allow a maximum MTU of 2304B(802.11) and 1500B (Ethernet). In a real life scenario, a maximum allowed segment size for the entire TCP packet is 1448B. L2 first encrypts the message with AES CBC, then the ciphertext is sent to the TCP layer and is split in smaller chunks. For L3, the plaintext is first split in little chunks, then each chunk is encrypted individually, so the encryption algorithm (TLS also uses AES for encryption) has to digest smaller chunks. The overhead may be introduced by the AES CBC algorithm which may have an exponential growth for the encryption time as the input plaintext size is getting larger[15].
Although L0 and L3 run experiments with MESSAGE_SIZE up to 160KB, the L2 implementation run out of memory for MESSAGE_SIZE of 80KB and over. The ESP32[15] has a total RAM of 520KB of 3 SRAM modules. The .text section is mapped to SRAM0(192KB). The .heap section is mapped in SRAM1(128KB) and free space from SRAM0 and SRAM2 is also reserved for .heap section. L2 implementations needs 2 * MESSAGE_SIZE KB of memory, as the encryption function needs one source array for the plain-text and one destination array for the cipher-text. L2 implementations crashed because the heap allocator couldn't alloc more than 160KB in the remaining .heap size. The dynamic memory consumption of L2 at runtime is at least twice compared to L0 and L3. On the other side, L3 implementation uses less .heap but the .data has to store the SSL certificate key (2KB). ESP32's memory layout is represented in Figure 12. ESP32 has quite a lot of DRAM memory, but there are IoT embedded clients with less available memory.
For small values of MESSAGE_SIZE and high values of TOTAL_MESSAGES sent during the experiment, the overhead of L2 was slightly better than L3, because the AES-CBC encryption runs for less rounds to encrypt one message, as represented in Figure 11. This is a common scenario for IoT networks, where nodes send over network small messages (few bytes up to few kylobytes). This is only valid for a high values of TOTAL_MESSAGES, like a client sending more times over a second floats values read from a sensor, otherwise TLS may be a better choice performance wise. Remember that only the values read from the sensor are obfuscated and an attacker is able to hack the communication.
The following conclusions were drawn from the above experiments: