Differences

This shows you the differences between two versions of the page.

Link to this comparison view

iothings:proiecte:2022sric:ub-benchmark [2023/06/01 11:25]
lucian_ioan.popescu [Compiling Nuttx with Clang]
iothings:proiecte:2022sric:ub-benchmark [2023/06/01 22:09] (current)
lucian_ioan.popescu [Running the Benchmarks]
Line 15: Line 15:
 ==== Compiling Nuttx with Clang ==== ==== Compiling Nuttx with Clang ====
  
-The first goal of this project was to compile Nuttx with Clang. This happened because much of my work on researching the impact of UB for other use cases was already done in Clang.+The first goal of this project was to compile Nuttx with Clang. This happened because much of my work on researching the impact of UB for other use cases was already done in Clang. ​There were already some efforts in this area [2] but I could not use them because they did not provide the complete toolchain for compiling Nuttx. 
 + 
 +The ISA used by ESP-32 boards is designed by Cadence and it is called Xtensa [4]. Much of the work of targeting this architecture in LLVM was already started by Espressif [3]. The first step of integrating the LLVM fork of Espressif into Nuttx was to hack into the build system of Nuttx to be able to compile it with Clang. The patches that I introduced can be found in my fork of Nuttx [5]. 
 + 
 +In summary, the changes that I need to do were the following:​ 
 +  * Add `-target xtensa` to ARCHCFLAGS and ARCHCXXFLAGS 
 +  * Use binutils'​ linker and libraries 
 +  * Patch _bbci and srli assembler instructions in source files because they were not correctly handled 
 +  * Modify the generated .config file to replace CONFIG_XTENSA_TOOLCHAIN_ESP with CONFIG_XTENSA_TOOLCHAIN_XCLANG 
 + 
 +After this step was finished, I had to move to Xtensa LLVM to patch it in order to successfully compile all Nuttx source code. You can find my fork of Xtensa LLVM here [6]. The modification I had to do in this step was rather simple, i.e. solve a typing error in the register info tablegen. For the `intset` register, the name of the register was wrongly typed as `interrupt`. However this process was time consuming because I had do debug various parts of the Xtensa backend before getting to the root cause of the problem. 
 + 
 +At this point, I successfully compiled Nuttx with Xtensa LLVM. 
 + 
 +==== Running the Benchmarks ==== 
 + 
 +To run the benchmarks and fetch the results I used Coremark [7]. From their website: 
 +"​EEMBC’s CoreMark® is a benchmark that measures the performance of microcontrollers (MCUs) and central processing units (CPUs) used in embedded systems. Replacing the antiquated Dhrystone benchmark, Coremark contains implementations of the following algorithms: list processing (find and sort), matrix manipulation (common matrix operations),​ state machine (determine if an input stream contains valid numbers), and CRC (cyclic redundancy check). It is designed to run on devices from 8-bit microcontrollers to 64-bit microprocessors."​ 
 + 
 +The results I was interested are the following: coremark score (speed of execution), code size and power consumption. For the first metric I used the output of coremark which I will present later. For the second metric I measured the binary size of Nuttx and Coremark after compiling them and for the third metric I used an USB tester [8] that displays the voltage and the current consumed by my ESP32 board [10]. 
 + 
 +The following is a sample output for coremark: 
 +<​code>​ 
 +2K performance run parameters for coremark. 
 +CoreMark Size    : 666 
 +Total ticks      : 59040 
 +Total time (secs): 59.040000 
 +Iterations/​Sec ​  : 338.753388 
 +Iterations ​      : 20000 
 +Compiler version : Clang 15.0.0 (git@github.com:​lucic71/​llvm-project-espressif.git ae7b70b2d0097fd6745ebf2ade6fdffccc879142) 
 +Compiler flags   : -fomit-frame-pointer -ffunction-sections -fdata-sections -O2 -fwrapv 
 +Memory location ​ : Stack 
 +seedcrc ​         : 0xe9f5 
 +[0]crclist ​      : 0xe714 
 +[0]crclist ​      : 0xe714 
 +[0]crcmatrix ​    : 0x1fd7 
 +[0]crcstate ​     : 0x8e3a 
 +[0]crcfinal ​     : 0x382f 
 +Correct operation validated. See README.md for run and reporting rules. 
 +CoreMark 1.0 : 338.753388 / Clang 15.0.0 (git@github.com:​lucic71/​llvm-project-espressif.git ae7b70b2d0097fd6745ebf2ade6fdffccc879142) -fomit-frame-pointer -ffunction-sections -fdata-sections 
 +-O2 -fwrapv / Stack 
 +</​code>​ 
 + 
 +From this output we are interested in the last line. What is displays is the Coremark score and the compiler configuration used for generating this score. For this benchmark experiment, higher scores represent better results. 
 + 
 +To change the compiler configuration,​ ARCHOPTIMIZATION needs to be changed accordingly in arch/​xtensa/​src/​lx6/​Toolchain.defs. 
 + 
 +Next I will present all configurations used for this experiment. I used a total of 13 configurations based on various flags that change the behavior of the compiler with regards to exploiting UB. 
 + 
 +^ No ^ UB flag      ^ Description ​      ^ 
 +| 1 | -fwrapv ​   | Treat signed overflow as two's complement ​   | 
 +| 2 | -fno-strict-aliasing ​   | Don't use type based alias analysis ​   | 
 +| 3 | -fstrict-enums ​   | Enable optimizations that take advantage of enum's value range     | 
 +| 4 | -fno-delete-null-pointer-checks ​   | Assume that programs can safely dereference null pointers ​    | 
 +| 5 | -fno-finite-loops ​   | Don't assume that all loops are finite ​    | 
 +| 6 | -fconstrain-shift-value ​   | Constrain shift RHS so it doesn'​t produce undefined results when RHS >= bitwitdh ​    | 
 +| 7 | -fno-constrain-bool-value ​   | Don't constrain bool values in {0,1}     | 
 +| 8 | all + -O2    | All flags from above + -O2     | 
 +| 9 | all + -Os    | All flags from above + -Os     | 
 +| 10 | base + -O2    | No flag from above + -O2    | 
 +| 11 | base + -Os    | No flag from above + -Os     | 
 +| 12 | -fno-use-default-alignment ​   | Use alignment of one for all memory operations ​    | 
 + 
 +==== Results ==== 
 + 
 +The first set of results will cover power consumption. For all benchmark configuration the consumed current had the value of 90mA and the value of the voltage was 5.11V. Because the USB tester that I used had a resolution of 10mA, it could not measure all the values between 85mA and 95mA, thus all the results have the same value, i.e. 90mA. In idle mode, the board consumed 70mA. 
 + 
 +Note that during the experiments,​ the board was put in Modem-sleep power mode [9] at normal speed (80MHz). The datasheet states that in this configuration,​ the board should consume between 20mA and 30mA, not 70mA as presented on the USB tester. The reasons for this discrepancy are unknown at this moment. 
 + 
 +The next set of results has to do with code size. After each compilation with a particular compiler configuration,​ the size of the generated binaries, nuttx and nuttx.bin was recored, the results are presented in the following plot: 
 + 
 +{{:​iothings:​proiecte:​2022sric:​code-size.png?​900|}} \\ 
 + 
 +`all + -O2` increases the code size with a small percent compared to the configurations from its left, i.e. the configurations where a single flag is used. Furthermore `all + -O2` and `all + -Os` both increase the code size compared to their counter sides, i.e. `base + -O2` and `base + -Os`, with 1% for both configurations in the case of nuttx.bin. Thus, for Nuttx and Coremark, the code size is increased when using flags that take advantage of UB. 
 + 
 +The final set of results covers the coremark score for each configuration. The score is extracted from the coremark output presented in the last section. 
 + 
 +{{:​iothings:​proiecte:​2022sric:​coremark-score.png?​900|}} \\ 
 + 
 +There is no specific improvement between `base + -O2` and all the configuration that make use of UB. However what is interesting to see is the impact of `all + -Os` compared with `base + -Os`. There is a performance decrease by 1%. 
 + 
 +Note that no results set contains numbers for the -fno-use-default-alignment configuration. This happens because Nuttx crashes when compiled with this flag and no benchmark can be run. Compared to x86, for which this flag was initially is targeted, Xtensa has stricter alignment rules that cannot be modified.  
 + 
 +==== Conclusions and Further Work ==== 
 + 
 +The results show that there is not much difference in terms of code size, code speed and power consumption in the context of undefined behavior for Nuttx and Coremark. One reason for that might be that while developing those systems, the developers made little to no use of undefined behavior. Another reason might be that Xtensa LLVM cannot take proper advantage of undefined behavior when triggering optimizations for Nuttx and Coremark. 
 + 
 +Those are interesting paths worth further researching. Moreover, rerunning the experiments with a better USB tester can lead to more accurate results with regards to the power consumption capabilities of the Xtensa processsor.  
 ==== References ==== ==== References ====
  
-[1] [[https://​gist.github.com/​rygorous/​e0f055bfb74e3d5f0af20690759de5a7|A bit of background on compilers exploiting signed overflow]]+[1] [[https://​gist.github.com/​rygorous/​e0f055bfb74e3d5f0af20690759de5a7|A bit of background on compilers exploiting signed overflow]] \\ 
 +[2] [[https://​meka.rs/​blog/​2017/​07/​03/​nuttx-and-clang/​|NuttX and Clang]] \\ 
 +[3] [[https://​github.com/​espressif/​llvm-project|Fork of LLVM targeted at Xtensa]] \\ 
 +[4] [[https://​www.cadence.com/​content/​dam/​cadence-www/​global/​en_US/​documents/​tools/​ip/​tensilica-ip/​isa-summary.pdf|Xtensa ISA]] \\ 
 +[5] [[https://​github.com/​lucic71/​nuttx|My fork of Nuttx]] \\ 
 +[6] [[https://​github.com/​lucic71/​llvm-project-espressif|My fork of Xtensa LLVM]] \\ 
 +[7] [[https://​www.eembc.org/​coremark/​|Coremark]] \\ 
 +[8] [[https://​www.emag.ro/​tester-usb-ut658-uni-t-afisaj-lcd-9999-mah-mie0415/​pd/​DYL0T7MBM/?​ref=hdr-favorite_products|Tester USB UT658 Uni-T, afisaj LCD, 9999 mAh]] \\ 
 +[9] [[https://​www.espressif.com/​sites/​default/​files/​documentation/​esp32_datasheet_en.pdf|ESP32 datasheet]] \\ 
 +[10] [[https://​www.emag.ro/​placa-dezvoltare-esp32-devkit-v1-ai669/​pd/​DXV9FDMBM/​|Placa dezvoltare ESP32, DEVKIT V1]]
iothings/proiecte/2022sric/ub-benchmark.1685607957.txt.gz · Last modified: 2023/06/01 11:25 by lucian_ioan.popescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0