ST Edge AI Core for STM32MPU series

for STM32MPU target, based on ST Edge AI Core Technology 2.2.0

STM32MPU target

Overview

The article describes the specificities of the command line for the STM32MPUs targets.

Supported features

The command line allow the user to select a specific target. In this article, the focus is on STM32MPUs targets.

The main CLI functions for STM32MPUs are described in the picture below:

Analyze command allows to gather informations on your model such as shapes, format, quantization scheme… It is currently under development and will available soon.
Generate command is used to create an optimized model that can be run on Neural Processing Unit (NPU) from your base model. This command generate an Network Binary Graph (NBG), it can be copied directly to your STM32MPU board for use in your application.
Validate command allows to benchmark the generated model and verify the impact of the optimization.

On STM32MP2x boards, the quantization format supported by the NPU is exclusively 8-bits per-tensor asymmetric. For the best inference performance, please provide a model that follows this recommendation. If the model provided is using another quantization scheme, like per-channel quantization, the generated NBG model will run mainly on Graphics Processing Unit (GPU) instead of NPU increasing the inference time.

Warning: The provided ONNX/TFLite model should not contain dynamic shape.

For more information, please visit the wiki page at https://wiki.st.com/stm32mpu/wiki/Category:X-LINUX-AI_expansion_package in AI - Tools category.

Supported STM32MPU series

supported series	description
`stm32mp25`	STM32MP25 device with dual ARM Cortex-A35 cores, ARM Cortex-M33 core and NPU accelerator

Analyze command

This command is not implemented yet. It will be available in a future version.

Generate command

Minimal options

To generate a model optimized for the STM32MPUs platforms there is no specific option, it is only needed to specify the path to the AI model and the target used. The output model generated is a Network Binary Graph (NBG) with the .nb extension, it is pre-compiled and optimized for your choosen target. The main generate command process for STM32MPUs is described in the picture below:

`--model / -m`

It is the path to an AI model. Currently for STM32MPUs targets, only TFLite and ONNX models can be converted to Network Binary Graph (NBG).

`--target`

It is the name of the STM32MPU target for which you want to generate the optimized Network Binary Graph (NBG).

`--entropy`

This variable should be used when the provided model have PER-CHANNEL quantization scheme. The entropy argument allow the tool to modify the model in order to convert PER-CHANNEL quantized to PER-TENSOR quantized layers. The entropy is a float value in the range [0, 1]. When the value is close to 1

`--input-data-type`

This variable is only used when the provided model have FLOAT32 input(s). By default, if the model have FLOAT32 input(s), the STM32MPU tool is automatically interpreting the input(s) as FLOAT16. To keep the input type to FLOAT32 this variable need to be setted. ex: –input-data-type float32 Warning: –input-data-type argument will have no effect if –output-data-type argument is not set to float32.

`--output-data-type`

This variable is only used when the provided model have FLOAT32 output(s). By default, if the model have FLOAT32 output(s), the STM32MPU tool is automatically interpreting the output(s) as FLOAT16. To keep the input type to FLOAT32 this variable need to be setted. ex: –output-data-type float32

Additional options

`--workspace / -w`

It is the path to the workspace directory. The report of the generate command will be stored in this directory.

`--output / -o`

It is the path to the output directory. The generated Network Binary Graph will be stored in this directory.

`--no-report`

With this argument, the report will not be generated.

Example

Generate only the Network Binary Graph (NBG)

$ stedgeai generate -m <model_file_path> --target stm32mp25
...
Generated files (2)
-----------------------------------------------------------
 stm32ai_output/<model_name>.nb
 stm32ai_ws/<model_name>_stm32mp25.json
...

Common errors

Trying to generate a model with arguments ‘–intput-data-type float32’ or ‘–output-data-type float32’

$ stedgeai generate -m <model_file_path> --target stm32mp25 --output-data-type float32
...
E010(InvalidModelError): Model output is not float32.
...

This means that the inputs or outputs of the provided model are not in float32 format. Please regenerate a ONNX or TFLite quantized model with float32 I/O and retry or unset the ‘–intput-data-type float32’ or ‘–output-data-type float32’ option.

Trying to generate a model with arguments ‘–intput-data-type float32’ or ‘–output-data-type float32’

$ stedgeai generate -m <model_file_path> --target stm32mp25
...
E010(InvalidModelError): Error during NBG compilation, model is not supported
...

This means that the model is not currently fully supported on the GPU/NPU due to some unsupported layers. Please try to use the ONNX Execution Provider or TFLite delegate by following the wiki page: https://wiki.st.com/stm32mpu/wiki/Category:X-LINUX-AI_expansion_package. These two methods are able to execute the supported layers on the GPU/NPU, and the unsupported layers will fall back to the CPU.

Validate command

Minimal options

To validate a model on an STM32MPU platform, you need to have an STM32MP25 board connected to your host Linux PC. There are two ways to do it:

The PC should be connected to the board with a USB-C/USB-C, USB-A/USB-C cable on the USB-C DRD port of the STM32MP25. This connection will create an Ethernet connection via USB with the following IP adress: 192.168.7.1.
Or, the STM32MP25 must be connected to internet and must have an IP address.

The validate command will send the model to the board and run the inference directly on the CPU or on the GPU/NPU, depending on the options passed as arguments.In parallel, the same model will run on the host PC, and the results of the two executions will be compared.

A set of metrics is returned:

  acc  : Accuracy (class, axis=-1)
  rmse : Root Mean Squared Error
  mae  : Mean Absolute Error
  l2r  : L2 relative error
  mean : Mean error
  std  : Standard deviation error
  nse  : Nash-Sutcliffe efficiency criteria, bigger is better, best=1, range=(-inf, 1]
  cos  : COsine Similarity, bigger is better, best=1, range=(0, 1]

This allows the user to find out if the model executed on the board has similar results to the one on the host PC.

`--model / -m`

It is the path to an AI model. Currently for STM32MPUs targets, only TFLite and ONNX models can be converted to Network Binary Graph (NBG).

`--target`

It is the name of the STM32MPU target for which you want to generate the optimized Network Binary Graph (NBG).

`--mode`

Only supported option for STM32MPUs: –mode target

`-d/--desc`

This argument manage the decriptor given to the validate mode. The different options of the descriptor are separeted by the : character. On STM32MPUs, the descriptor can contain 4 values:

mpu : this select the execution of the STM32MPUs pass. This is the only mandatory option.
<ip_address> : by default the IP address is set to 192.168.7.1 which is the address given when the board is connected to a host PC using USB type C DRD port.
cpu : this allow the user to choose between CPU or GPU/NPU execution
model_path : it is possible to pass the model path as argument, else the model path is automatically recovered from ‘-m/–model’ option

In general this argument will be described as follow:

To run on NPU/GPU: -d mpu:192.168.7.1:npu
To run on CPU: -d mpu:192.168.7.1:cpu

ST Edge AI Core for STM32MPU series
ST Edge AI Core Technology 2.2.0

ST logo Information in this document is provided solely in connection with ST products. The contents of this document are subject to change without prior notice. © Copyright STMicroelectronics 2025. All rights reserved. www.st.com