Release Note

ST Edge AI Core Technology 2.2.0

Overview

Part of the ST Edge AI Suite (https://www.st.com/content/st_com/en/st-edge-ai-suite.html), the ‘stedgeai’ application is a console utility which provides a complete and unified Command Line Interface (CLI) to compile a pretrained DL/ML model into an optimized library that can run on an ST device/target enabling edge AI on MCUs, MPUs, and smart sensors.

This article documents all of the main feature changes, interface changes and reported defects that have been fixed. Check the STMicroelectronics support website at www.st.com/content/st_com/en/st-edge-ai-suite.html for the latest version.

Release v2.2.0

Enhancements and new features

Fixed limitation about the Keras 3 models generated with the Sequential API
STELLAR target
- Full support for stellar-pg devices
  - Arm® Cortex® R52 with NEON extensions (SIMD, dual precision floating-point) network runtime libraries added
  - Validation on target enabled for the Arm® Cortex® R52 cores with NEON extensions
- Stellar Studio AI plugin v2.2.0 updated accordingly
- Stellar Studio AI demos added to run inferences on different stellar-pg devices and cores
ST Neural-ART NPU target (stm32n6 + ST Neural-ART NPU)
- Add new N6 validation project configuration (usbc support)
- Add new N6 validation project for Arm® Cortex® M55 core only
- Add Nucleo-boards support for both validation projects
- N6_scripts:
  - Add a --clean option to the n6_loader.py to clean the validation project before building
  - Add a new cm55_loader.py script to ease loading of the CM55 validation project on stm32n6
  - Enhance logging (log files are more verbose than before)
- Enhance NPU support for runtime loadable model (relocatable model) - Python scripts v1.3
  - Support for LLVM-based toolchain environment
  - Add option to put the ecblobs (const data part) with params/weights
  - Allow absolute @ for external memory pool (activations)
- NPU compiler
  - Added HW support for Scale-Offset Max, Min, Neg operations
- Runtime
  - Introduced ll_aton_rt_user_api.h as the main user API include. Introduced LL_ATON_RT_SetNetworkCallback and deprecated LL_ATON_RT_SetEpochCallback.
STM32MPU target
- Re-enable support after backend update/alignment.
ISPU target
- Improved memory occupation and execution time of the pattern Convolutional layer + Non-Linearity + Pooling for int8 quantized models
- Improved execution time of Max pooling layers in int8 quantized models
- Improved execution time of Convolutional layers with per channel quantization
- Improved execution time of Convolutional / Dense layers in int8 quantized models

Reported defect fixes

ST Neural-ART NPU target (stm32n6 + ST Neural-ART NPU)
- Fixed unresolved symbols during the generation of the relocatable model
- Fixed compilation warnings during the RT runtime computation when RELATIVE address mode is used for a given memory-pool
- NPU compiler
  - Fixed various clang-tidy warnings and code style issues.
  - Fixed bugs in scale/offset computations, kernel value estimates, and quantization propagation.
  - Addressed issues with ECasm, ECtracer, and ECloader regarding encryption, patches, and relocations.
  - Fixed problems with batch2output mode of concats and min/max kernel estimates.
  - Corrected handling of less-than-8-bits initializers and quantization for unit tests.
  - Fixed error handling and improved assertion usage (converted asserts to ATONN_ASSERT or exceptions).
MLC target
- Fixed missing high-g accelerometer power down at the start of generated configuration
- Fixed issue with parsing of datalogs with single column
- Fixed parsing of “ext_x”, “ext_y” and “ext_z” columns in datalogs
- Fixed vAFE usage for ST1VAFE6AX
- Fixed supported MLC ODRs for ST1VAFE3BX
- Fixed feature names and sensitiviy parsing in vAFE only mode for ST1VAFE3BX
- Fixed crash on invalid feature type
- Fixed crashes when using the maximum or almost the maximum number of features
ISPU target
- Fixed wrong Batch_Normalization float parameters merge into Convolutional layers with ONNX QdQ int8 models

Known limitations

Caution AiRunner package associated communication protocol has changed:
- Newer versions of AiRunner will not work with older validation projects
- Newer validation project will not work with older AiRunner versions

Documentation

ST Neural-ART NPU target (stm32n8 + ST Neural-ART NPU)
- “How to update my project with a new version of ST Edge AI Core?” (new article)
- Update the “Runtime loadable model support” article.

CLI

Add option to set parametric dimensions in the shapes of the input tensors.
Add option to enable an ONNX simplifier pass before to import the ONN model.

Embedded C-api

no API breaking change.
The embedded inference client API (legacy) will be marked as DEPRECATED.

Release v2.1.0

Enhancements and new features

Add support for Keras 3 models (.h5 and .keras); back-compatibility to Keras 2 model has been maintained.
Add support for Keras 3 ops: absolute, arccos, arccosh, arcsin, arcsinh, arctan, arctan2, arctanh, argmax, argmin, ceil, cos, cosh, divide, equal, erf, floor, floor_divide, greater, greater_equal, hard_silu, leaky_relu, less, less_equal, log, logical_and, logical_not, logical_or, logical_xor, maximum, minimum, mod, negative, not_equal, relu, relu6, round, rsqrt, selu, sign, sin, sqrt, tan, tanh
Update Keras from 2.15 to 3.7.0
Update TensorFlow from 2.15.1 to 2.18.0
New layers:
- ATAN2 (TFLite)
- SCATTER_ND (TFLite)
- ScatterND (ONNX)
- SIGN (TFLite)
ISPU target
- Add support for sensor configuration in JSON file format for validation on target
- Add self-sufficient validation on target mode (does not require the generation step and the build of the validation application for ISPU)
- Solved long time for board auto-discovery when performing validation on target on some setups
- Improved performance of int8 multi-channel dense layers
MLC target
- Add support for new MLC-compatible devices (IIS2DULPX, LSM6DSV80X, ISM6HG256X, LSM6SDV320X)
- Improve ProfiMEMS board data injection functionalities for accelerometer devices
- Add support for the new ProfiMEMS board STEVAL-MKI109D
- Add full support for sensor configuration in JSON file format and remove support for UCF file format
  - Generate command now creates the sensor configuration in the JSON file format and the new header file formats (regular and minimal)
  - Validate command now accepts the argument --mlc-conf to provide a sensor configuration exclusively in JSON format
- Solved long time for board auto-discovery when performing validation on target on some setups
Stellar target
- Support for stellar-pg target (SR6P line - 32-bit Arm® Cortex® M4 automotive integration MCUs for dme and dsph cores)
  - Arm® Cortex® M4 network runtime libraries for GCC GNU, HIGHTEC and ARCLANG Arm compilers with FPU support enabled (single precision)
  - Validation on target not enabled for Arm® Cortex® M4 cores: only available for Arm® Cortex® R52+ cores
  - Switch to the new official generated json file (c_info.json)
  - Stellar Studio AI plugin v2.1.0 updated accordingly
  - Stellar Studio AI multicore demo added to run multiple NN on both R52+ and M4 cores
STM32 target
- Add support for stm32u3 series.
ST Neural-ART NPU support
- Add script/guideline to profile a deployed model
- Add support to generate a runtime loadable model (relocatable model)
- Add a toolsuite to support encrypted-weights inference
- New features
  - ReduceMean Support: Implemented support for ReduceMean and other Reduce* nodes.
  - Swish Activation Detection: Added peephole for Swish activation detection, optimizing models like YOLO and EfficientNet.
  - Encryption Support: Introduced encrypted weights and decryption nodes in the epoch controller.
- Performance improvements
  - Optimized cache maintenance and memory pool handling.
  - Improved hardware mapping for edge cases and extended support for broadcasting eligibility.
- Bug fixes
  - Fixed memory pool alignment issues.
  - Resolved bugs in Stream Engine limit computation and DMA address limit configuration.
  - Addressed compiler warnings and undefined behavior in arithmetic lowering.
  - Fixed JSON output generation and node mapping issues.
  - Fixed mapping and allocation issues for epoch controller parameters.

Known limitations

Keras 3 models generated with the Sequential API (Sequential class) are not fully supported. It is recommended to use the Model or Functional API to design models. A workaround to import this type of model with the ST Edge AI Core is to convert it to a TFLite model.
```
converter = tf.lite.TFLiteConverter.from_keras_model(loaded_model)
tflite_model_f32 = converter.convert()

with open(TFLITE_FILE, 'wb') as f:
    f.write(tflite_model_f32)
```
STM32MPU target
- Not supported with STEdgeAI Core 2.1 for compatibility reason (STEdgeAI Core 2.1 should be used)

Documentation

new articles about the ST Neural-ART NPU to support a relocatable mode and to explain how to profile a deployed model.
new article about weights encryption support on ST Neural-ART NPU
new article about the code generation output report “c_info.json”

CLI

no API breaking change
--full option is marked as DEPRECATED. It will be removed in the next release.

Embedded C-api

no API breaking change
The embedded inference client API (legacy) will be marked as DEPRECATED.
- Default value for the --c-api option will change in the next release to be set to st-ai.
- 'legacy' c-api will be always available but will no longer be fully tested or maintained in the futur release.
- It is recommended to use the new c-api.

Reported defect fixes

Removal of an installed component does not require anymore removal of the corresponding directory
Different bugs related to import of TFLite BATCH_MATMUL and ONNX Gemm/MatMul have been fixed

Release v2.0.0

Enhancements and new features

Update ONNX version from 1.10.2 to 1.15.0
Update ONNX runtime package from 1.17.3 to 1.18.1
New layers:
- BATCH_MATMUL (TFLite)
- BROADCAST_ARGS (TFLite)
- BROADCAST_TO (TFLite)
- GATHER_ND (TFLite)
- GatherND (ONNX)
- RELU_0_TO_1 (TFLite)
Layer extensions: MatMul with >2D inputs
Extend ‘validate’ flow to create different numpy npy files by input/output for original model and deployed c-model.
CLI options
- Add --input-memory-alignment and --output-memory-alignment to control I/O memory alignment
- Improve options to control the data layout for input and output tensors (that is, channel position): --inputs-ch-position and --outputs-ch-position which replaces --input-output-channel-position
- Introduce -–cut-input-tensors, --cut-output-tensors, --cut-input-layers, --cut-output-layers options to cut the model before starting the analysis
Support for STM32N6 series, including the ST Neural-ART accelarator™
- Arm® Cortex® M55 network runtime libraries with NVE support enabled (GCC GNU Arm, IAR, Keil® AC6 toolchains)
Support for stellar-pg target (SR6P line - 32-bit Arm® Cortex® R52+ automotive integration MCUs, Arm v8-R compliant)
- Arm® Cortex® R52+ network runtime libraries for GCC GNU and HIGHTEC Arm compilers with FPU support enabled (single precision)
- Stellar Studio AI plugin v2.0.0 updated with the Stellar SR6P3 support
- Stellar Studio AI SR6P3 demo added to perform the on-target validation
- Developer Cloud extension updated with the support of the SR6PX-EVBC4000P (Rev. A) board for the SR6P3
Extend support for stm32mp25 target
- Add validate command for validation on target
- Add --entropy option for generate command to convert on-the-fly per-channel quantized model to per-tensor quantized model (this can impact accuracy).
- Add --input-data-type and --output-data-type options for generate command to force the generated model to interpret inputs/outputs as float32 instead of float16 (these variables can only be used when the provided model has FLOAT32 inputs/outputs).
- Improve support for ONNX models
Add individual nodes execution-time measurement when performing validation on target for ISPU
- New firmware and template from GitHub (https://github.com/STMicroelectronics/st-mems-ispu) are necessary
Native support for QKeras 16-bits fixed-point quantization schemes for ISPU target (dense, convolutional, and ReLU layers)
Extend MLC validation on target datalog formats to support all ST MEMS software for logging data (MEMS-Studio, Unico-GUI, Unicleo-GUI, and ST AIoT Craft)
Add default behavior to MLC validation on target if LABEL_{N} ground truth column is missing for decision tree DT_{N}: metrics computation is replaced with predicted class frequency counts.
Add support for new MLC-capable devices (ASM330LHBG1, ST1VAFE3BX).

Documentation

Update Stellar articles to be more generic regarding the family devices supported
Update the Command Line Interface article to add the stellar-pg target
Stellar Studio AI user manual - Getting started with Stellar Studio AI plugin for artificial intelligence (AI) updated
Create a HowTo article to explain how to use the AiRunner package
Provide a set of new articles about the ST Neural-ART accelarator™
Enrich ISPU articles with additional details regarding generated code integration and validation on target
Add new article with definitions, gloassary and trademark
MLC device-specific documentation for newly added devices (ASM330LHBG1, ST1VAFE3BX)

Interface changes

CLI

--allocate-inputs and --allocate-outputs options are deprecated. This is now the default behavior of the CLI. The new options: --no-inputs-allocation and --no-outputs-allocation can be now used to avoid to request the allocation of the inputs/outputs buffers in the activations buffer.
--input-output-channel-position is removed. Replaced by --inputs-ch-position and --outputs-ch-position options.

Embedded C-api

no API breaking change

Reported defect fixes

Fix support to models with constant outputs
Fix support ONNX QdQ ConvTranspose
Fix computation of shapes for some layers
Fix optimizations of quantized convolution models
Fix support to quantized GEMM through floating-point fallback
Fix import of ONNX QdQ models with unsigned input
Fix possible issues when a workspace populated by previous releases is used
Fix generated network details data structure partially allocated in data RAM
Fix fallback to float approximation error for 16-bits QKeras models
Fix C implementation selection for validation on host with target ISPU in some specific cases
Fix convolutional layers C implementation mapping when number of kernels >= 8 for ISPU target

Limitations

When one installed component is removed you need to manually remove the directory with the component name in the Utilities/<osname>/targets directory. For example if you remove the mlc component in the installation, you need to manually remove the Utilities/<osname>/targets/mlc directory.

Release v1.0.0

Initial release of the ST Edge AI Core technology. It introduces the support of the new ST devices: MCU, MPU, and smart sensors and a flexible installer to install only the requested components for a given target.

Enhancements and new features

Initial release.

Interface changes

No changes.

Reported defect fixes

Initial release.

Previous releases

What is new in X-Cube-AI V9.0.0?

Enhancements and new features

CLI
- core is now based on a new executable ('stedgeai') targetting different ST devices. The underlying technology and asociated flows (analyze, generate and validate) are similar to the previous version, only some options have been harmonized. It introduces a new embedded C-API: st-ai, not yet documented allowing to unify the interface and providing a relative more simple C-interface.
- the “quantize” command has been removed. Note that the quantized models through the tensorflow, ONNX, qKeras or Larq are always supported (see [QUANT][X_CUBE_AI_QUANT] article)
new operators
- ConvTranspose (KERAS)
- NOT_EQUAL, GELU, ZEROS_LIKE (TFLite)
- quantized 8b UNIDIRECTIONAL_SEQUENCE_LSTM (TFLite)
- Mod (ONNX)
Core - Code generator
- introduce new embedded C-API (experimental), option '--c-api st-ai' should be used.
Embedded Library
- bz167868 - remove usage of the CRC IP to lock the network-runtime library
- bz154982 - optimize implementation of the quantized dense layers
- bz157480, bz154026, bz161225, bz167236, bz162208 - multiple optimizations for quantized PW/DW layers
- bz146681 - enhance LSTM support for quantized 8b version (TFLite operator)

Major bug fixes

(bz161128) fix wrong zeropoint sign mgt in SUB operation with scalar as second operand
(bz160077) fix to support multiple input tensors produced by a same operator
(bz164547) fix the merge of the BN in dense layer
(bz164479) fix code generator with Conv1D and stride = 2
(bz160262) fix internal shape computation for ONNX flatten operator

What is new in X-Cube-AI V8.1.0?

Enhancements and new features

UI
- add specifix checkbox allowing to set the “–no-onnx-io-transpose” option
- include new version of [TensorFlow Lite for Microcontrollers][X_CUBE_AI_TFLM] (SHA-1 : 6a1803)
CLI
- enhance global model format detection, improve the report of the number of operation per layer
Core - Code generator
- optimization of the quantized kernel implementations: CONV2D (bz151774), ADD (bz153931)
- add support for new layers:
  - Einsum (bz121864), BitShift / ONNX Operators
  - Support to the gather with second input variable (TFLITE & ONNX Operators)
  - Support to LSTM with constant inputs not put in initializers (bz141640)
Embedded Inference API
- no new features

Major bug fixes

(bz129411) TFLITE - support for transpose of batch dimension
(bz152744) ONNX - enhance import of concat operator managing arbitrary inputs order
(bz148917) ONNX - enhance import of resize operator considering the constant inputs
(bz135654) TFLITE - enhance Dilated convolution 1D/2D (SpaceToBatchNd/BatchToSpaceNd)
(bz141553) ONNX - support transpose lop between dequantize and convolution ops
(bz144706) run-time lib - extend n_macc resolution to 64b
(bz149752, bz150344) code gen - memory allocator, enhance multiple heap support

What is new in X-Cube-AI V8.0.0?

Enhancements and new features

CLI
- for a given model, the analyze and generate commands report the [requested memory size][CLI_ref_requested_memory_size] (FLASH and RAM) for the deployed AI stack (infrastructure + used kernels)
- Relocatable mode - Initial support for [Custom layers][X_CUBE_AI_KERAS_CUSTOM], only self-containts custom c-files.
- (bz138721) - relocatable mode - API files (ai_reloc_network.c./.h) are exported in the bundle of the generated files.
Core - Code generator
- (bz111201) ONNX - add support for [quantized ONNX models][X_CUBE_AI_QUANT] (QDQ format)
- (bz135734) ONNX - add scalar tensors support
- (bz136311) DQNN - improve support to hybrid int1/int8 convolution
- Upgrade ONNX runtime module to 1.13.1 version (ONNX 1.10.2 version is not upgraded), allowing QDQ support
- (bz136824) KERAS - support to StridedSlice in Keras TFOpLambda/Lambda Layer
- add support for new layers:
  - QLinearMul (ONNX QOperator)
- (bz136250) Renderer - improve concat deployment
Embedded Inference API
- (bz112264) Align with CMSIS 5.9
Embedded documentation
- add [new article][X_CUBE_AI_QUANT] to describe the support of the quantized models. It provides the guidelines/options/limitations to quantize a TensorFlow and ONNX model.
- STM.AI “quantize” is marked as deprecated

Major bug fixes

(bz113917) ONNX - channel first layer with constant eltwise input
(bz138240) TFLITE - support of pad on channel dimension
(bz139113) KERAS - fix import of the 4D placeholders
(bz136824) KERAS - support to lambda layer with closure
(bz135790) STM32 test app - fix compilation warnings when -pedantic option is used

What is new in X-Cube-AI V7.3.0?

Enhancements and new features

User Interface
- upgrade the support of the [TensorFlow lite for microcontroller runtime][X_CUBE_AI_TFLM]
- add optional selector to define the objective of the optimizations
CLI
- add global optimize option to enable or not the optimization passes to tend to reduce the requested RAM size or to minimize the inference time (or latency) (see ['-O/--optimization'][CLI_optimization_option] option)
- cli:reloc - use the ISO C99 dialect to compile the c-files ('-std=c99' option)
Core - Code generator
- (bz133234) ONNX import - support model with input fixed shape and output with undefined shape

Major bug fixes

(bz133214) CLI - improve log of the “generate” command. '-v 2' option can be used to have more details about the imported/generated model.
(bz133045) CLI - ai runtime library files from the user workspace are not systematically updated between two “valid” command.
(bz133810) CLI - memory-pool - fix INTERNAL ERROR when requested RAM don’t fit in memory-pools.
(bz131095) stm32 app - fix compilation warning with IAR 9.x
(bz131121) stm32 app - fix HAL support for STM32MP1
(bz131254) stm32 app - fix support for STM32C0 series
(bz133743) code gen/runtime - fix support of concat with large number of inputs
(bz133783) code gen/runtime - fix Softmax on non-axis channel
(bz132916) DQNN / Larq - fix computation of the quantized weights
(bz133202) Fix regression when a BN is folded in a DW layer
(bz132508) GUI - restore correct “Allocate Inputs/Outputs” check-boxe values between two sessions
(bz131253) GUI - fix display of the available external SDRAM for STM32F469i board
(bz134118) GUI - fix validation on target for STM32U5 series

What is new in X-Cube-AI V7.2.0?

Enhancements and new features

User Interface
- TensorFlow lite for microcontroller is always based on the code available in 7.1 release
- unify and centralized analytics (UI & CLI)
CLI
- add ['--quiet'][CLI_common_args] option to disable the display of the progress bar during the execution of the command.
- to generate a relocatable binary model, the ['--lib’] is no more requested, root lib directory is automatically detected from the installed pack.
- '--range' option can be used with a simple argument to pass a simple value.
- number of operations by data types is reported [(by layer and for entire generated model)][CLI_op_by_type]
Core - Code generator
- different implementations of c-kernel have been improved to have a better latency. Counterpart is to use more memory, and depending of the used models and the expected options (allocate-inputs,..) the activations buffer can be a little bit more important in comparaison of the previous version.
- TensorFlow 2.9 and Keras.io 2.9 modules are used to import the Keras (h5 file) and TensorLite file (tflite file) models ('tensorflow.keras' is no lore used).
- upgrade ONNX package to 1.10.2 version (no change for the ONNX runtime 1.9.0)
- new layers:
  - complete TreeEnsembleRegressor support (ONNX)
  - QActivation, QBatchNormalization, QDense, QConv2D, QConv2DTranspose, QDepthwiseConv2D (QKeras/Keras)
  - QuantConv2D, QuantDense, QuantDepthwiseConv2D (Larq/KERAS). See the new article: [“Deep Quantized Neural Network (DQNN) support”][X_CUBE_AI_DQNN]
  - enhance TimeDistributed support, wrapped layers: Conv2D, Dense, Flatten, MaxPooling2D, ZeroPadding2D, Dropout (KERAS)
- improve the error message when a feature/operator is not supported/implemented
- no more limitation of depth multiplier different than 1 for depthwise quantized layers
- compression level (medium, high) can be applied for GEMM operator
- add STM32G4/U5 series target for relocable binary model
Validation process
- add support for models which have the IO dimension shapes up to 6D (including the batch size)

Major bug fixes

(bz122230) UI - fix report of the overall flasl/ram size in the main tabl (multiple network UC)
(bz125178) UI - clarify usage of the “report’s output directory” user field
(bz126575) UI - fix global memory footprint reported with STM32U5 series
(bz126177) UI - fix issue when changing the runtime from Cube.AI to TFLM
(bz121296) CLI - add STM32U5 series target for relocable binary model
(bz121296) CLI - fix “–range” option allowing 0.0 as min value
(bz123747) CLI - allow directory path name with spaces for relocatable object generation
(bz126042) Core - fix support of MEAN layer when not applied on Channel dimension
(bz122402) Core - fix import of LocalResponseNormalization as lambda layer
(bz130213) Core - fix support for depthwise quantized layer with depth multiplier different than 1

What is new in X-Cube-AI V7.1.0?

Enhancements and new features

Enhancements of the User Interface
- adding new advanced setting interface to define the memory pools
- fix generation of the application template for the relocatable binary model
- add multiple-IO support for the application template
CLI
- introduce preliminary and optional option to pass a description of the target to define the memory pools (see ['--target'][CLI_target_option] option)
- add free text entry to enter specific command line options
Core - Code generator
- add support for new layers:
  - Normalizer, StandardScaler, MinMaxScaler, MaxAbsScaler, RobustScaler, RandomForest, ExtraTree, GradientBoosting, HistGradientBoosting, TreeEnsembleClassifier, ReduceLogSumExp, Scaler, TreeEnsembleClassifier, LinearClassifier, Where, Clip (ONNX-ML/ONNX)
  - TFOpLambda, stateful GRU Layer (Keras)
  - SPLIT_V, ReverseV2, SQUARED_DIFFERENCE (TFLite)
- [multi-heap support][API_multiple_heap] support allowing to split the activations buffer in different memory segments and to have first a better usage of the device with fragmented memory devices (no contignuous memory) like the STM32H7 series.
- add support for STM32 ARM Cortex M0 and ARM Cortex M0+ based
- upgrade TensorFlow lite runtime 2.7.0 and ONNX 1.9
- (bz100948) support of weights for the Keras custom layers
Embedded Inference API
- C-API break has been introduced to initialize the instance of the C-model (refer to [“API Breaking Change”][X_CUBE_AI_API_BREAK] article)
- report specific code error ('AI_ERROR_CODE_LOCK') if CRC IP not accessible (bz110629)
- extend relocatable C-API to manage the CRC IP as shared resource
Template, Validation and System performance built-in test applications
- update generated code to use the new C-API (multi-heap, new ai_buffer definition)
- improve code size (gain of ~20KB, toolchain dependent). The default 'printf()' function implementation has been replaced by a low cost proprietary implementation.
- align the TFLM-c wrapper to be able to use the TFLM 2.7 runtime.
Embedded documentation
- add new article to describe the [API breaks][X_CUBE_AI_API_BREAK] with the rationale and to provide guideline/tips to limit the integration issues.
- update [“Embedded Inference Client API”][X_CUBE_AI_API] article to describe the multi-heap support, introduction of helper function to simplify the creation and the initialization of the c-model, new 'io_buffer' description for the N-dimensional tensor support.

Major bug fixes

(bz114786) CLI - fix stm32ai CLI suppoprt in Collab/Jupyter note-book environment
(bz103040/bz117764) Core - support for PACK layer with dimensions>4
(bz90708) Core - support for unpack& transpose layer with dimensions>4
(bz117202) UI - fix incorrect heap/stack size definition for STM32H7 dual core projects
(bz112369) UI - fix compuation of the requested arena size for TFLM runtime
(bz112369) UI - improve report of the unsupported operators for TFLM runtime
(bz111842) UI - close correctly the advanced settings window
(bz108953) Core - improve support of ONNX LSTM operator (Now multiple outputs (final state, hidden state, cell state) for LSTM are allowed)
(bz116529) Core - improve shared axes support in PReLU (support the case where 1 is present in the middle of input shape)
(bz110221) Core - improve constant propagation for Pack and Reduce operators
(bz101344) Core - for ONNX model, sottmax not on channel axis is now supported by adding a transpose before and after softmax.
(bz112224) Core - improve support of 2D ONNX MatMul operator with multiple inputs (second input not constant).

What is new in X-Cube-AI V7.0.0?

Enhancements and new features

Enhancements of the User Interface
- enhance the STM32 development board setting. Insure that the clock system is correctly set by default for best performance.
Core - Code generator
- Add support for ONNX integer arithmetic operators, i.e. mul/div/sum/sub for signed unsigned 8/16/32 bit formats
- Add support for new layers:
  - Argmin/Argmax (ONNX – TFLite)
  - ArrayFeatureExtractor (ONNX)
  - Cast (ONNX - TFLite)
  - LSTM (ONNX)
  - SVM Classifier – SVC (ONNX)
  - Unpack (TFLite)
  - ZipMap (ONNX)
- Add partial support for new layers:
  - TFLITE_SPACE_TO_BATCH_ND/BATCH_TO_SPACE_ND (TFLite): when generated from the conversion of Keras dilated convolution
- Add support to iForest ML ONNX operator
- Add support for new non-linearities:
  - HardSwish
  - Swish
- Extended support to:
  - ReduceMean (ONNX) - Added support when applied to channel axis
  - Mean(TFlite) - Added support when applied to channel axis
  - Split (TFLite) layers - Added support when output tensor size on splitting axis is 1
  - TFOpLayer – Added support to ceil, floor, LRN, reduce_min, rsqrt, slice, strided_slice operators
- Added check for tflite files
- Batch size information is now correctly propagated through layers in intermediate representation
- Improved identification of dimensions order for ONNX models
- Intermediate representation now uses the same channel position of the imported model
- Order of output of generated C code is now the same of the analyzed model
- TensorFlow runtime version moved to 2.5
- Modified the generated C code from ONNX models, in order for the user to pass data in channel first order
- Precision improvement for the quantized models in particular for the models which the scale factors close to zero
Embedded Inference API
- Add optional user callbacks to manage the CRC IP as a shared resource
CLI
- Relocatable - Add --ihex option to generate Intel hexadecimal object file
- Improve check of the input validation data (shape/type)
- Add --seed option to set the initial seed for the generation of the random data
Validation process
- Add two new metrics: bias/mean and standard deviation of the error
- Provide a Python module ai_runner to export an unified inference interface for the different X-CUBE-AI run-time: X86 or STM32 (including the TFLm runtime) through the aiValidation test application.
Validation and System performance applications
- aiValidation: Add inference time per layer for TFLm runtime
TFLM runtime
- Upgrade the source files with TensorFlow Lite for Microcontroller 2.5.0
Embedded documentation
- How to upgrade a project with a new version of the library
- How to to run locally a generated c-model in C or Python environment
- Manage the CRC IP as a shared resources
- Add in TFLM article, a table to compare the supported operators between TFLM 2.5.0 and X-CUBE-AI
- Update the FAQ
- Complete/update the advices about the built-in validation flow

Major bug fixes

(bz103986) Scalar activations are now correctly supported. An error is risen for scale equal to Infinite in quantized models
(bz106312) Keras SimpleRNN is now correctly imported
(bz108736) Fix potential bug about memory buffer corruption on multiple output network models when allocate-outputs option is enabled
Fixed import of slice layer when size is –1
Fixed support to multiple tensors produced by same layer and used in same layer
Fixed merging of pad into multiple successive convolution layers
Fixed support of ONNX Reduce layers with keep_dims=False

What is new in X-Cube-AI V6.0.0?

Enhancements and new features

Enhancements of the User Interface
- add support of code generation with TensorFlow Lite Micro (Validation, System Performance, Template)
  - based on 2.3.1 git repository of Tensorflow
- add panel to add custom layer json configuration file
- add panel for external weight placement, UI + Automatically update scatter files or create it
- add checkbox to add the option –classifier for the validation (x86/stm32) (bz97448)
- enhance dual core (multiple context support) code generation with MX6.1
  - you can have one instance of X-CUBE-AI on the CortexM7 and one on the CortexM4
new operators/layers
- KERAS: Average, Custom layer, Lambda wrapper: tf.math.abs, tf.math.acos, tf.gather.. 25+
- ONNX: ConstantOfShape, DequantizeLinear, Equal, Gather, Greater, GreaterOrEqual, Identity, Less, LessOrEqual, Mean, Neg, Not, Or, QLinearConv, QLinearMatMul, QuantizeLinear, ReduceL1, ReduceL2, ReduceSumSquare, Shape, Xor
- TFLITE: EQUAL, EXPAND_DIMS, FILL, GATHER, GREATER, GREATER_EQUAL, L2_NORMALIZATION, LESS, LESS_EQUAL, LOGICAL_AND, LOGICAL_NOT, LOGICAL_OR, MIRROR_PAD, PACK, REDUCE_ANY, REDUCE_MAX, REDUCE_MIN, REDUCE_PROD, SHAPE, SQUARE, TILE, UNIDIRECTIONAL_SEQUENCE_LSTM
Code generator
- add initial support for the Keras lamdda and custom layers
- add initial support of the Keras stateful LSTM layer (batch-size=1 limitation)
- add warning messages for int/hybrid layers not supported
- improve error messages for unsupported-layers
- improve support for constant propagation
- improve prelu/relu integer support
- better support for generated TFLite model converted with TF2.3+ converter
- add new C-defines to know the dimension values by input/output tensors (bz91718)
- add support to integer element wise functions
- add support to nested Sequential model in Keras
- generate new <network>_config.h file with the tool version definitions
  - run-time check of the TOOLS API is now strict, no forward/backward binary compatibility is considered
CLI
- add new command (supported-ops) to list the supported operators by Deep Learning Framework
Validation process
- for the models with the inputs in integer, random data generator is now adapted to insure that the data are uniformly generated between [-128, 127] (or [0, 255] respectively).
- add new option --range allowing to set the min/max values for the random data (default: [0, 1[)
- add support to valid a Keras model with stateful LSTM layer (limitation: batch-size=1)
- add bool type support
- add code to support STM32 USB-CDC profile
Validation and System performance applications
- add support for TFLm run-times
Embedded documentation
- add new article about the TensoFlow Lite for microcontroller support
- add new article about the Keras stateful LSTM support
- add new article to explain how to use the USB CDC profile for aiValidation firmware
- update/rework the metric article to clarify the expected data for validation
- remove section about the Qmn arithmetic support

Major bug fixes

UI - fix dual core code generation with MX6.1
UI - fix error when using using “Generate relocatable network” in X-CUBE-AI UI with another name than network (bz96349)
validation - fix order of the inputs/outputs for the Keras/TFlite model with multiple I/O (bz101481)
validation - enable support for model w/o weights and w/o activations buffer
test applications - fix compilation issues with H7 DUAL core (bz9060)
test applications - fix compilation issue with Keil and -gnu option (bz100215)
enhance validation file importer to support only one sample by npz file (w/o batch dimension) (bz95755)
fix problems with shape of RepeatVector and recurrent layers
fix TFlite L2_Normalization import
fix bug on Conv2D float margin computation
fix problem when scale bias layer is merged in convolutional layer with zero bias
fix computation of complexity of dense
fix scratch buffer shape computation for conv2d integer
fix observer API to model with only one layer
add support to multiple use of same activation tensors (bz72010)
enhance eltwise optimization
improve integer clip support
fix ResizeBilinear TFLite op (half_pixel_centers parameter support for +TF2 version)

What is new in X-Cube-AI V5.2.0?

Enhancements and new features

Enhancements of the User Interface
- add UI options and update the project generation to test and to validate a relocatable binary model
  - same level of features for the aiValidation and aiSystemPerformance test applications (only multiple networks is not supported in this case)
  - GNU ARM Embedded tool-chain is automatically downloaded if not available
- add new UI button (Advanced Settings)
  - define the user output directory for the generated files (reports, data, ..)
  - check box to allocate the outputs in the activation buffer
TF Lite and Keras model support
- for the import and the validation of the TF Lite and Keras models, TensorFlow 2.3.0 package is used.
  - provide a better support of the TFLite models generated with TFLiteConverter 2.2 and later
- Keras.io support has been removed (TF_KERAS environment variable is no more available)
  - Keras NCHW models can be no more validated (code generation is always supported)
  - Up-to Keras 2.4.0 models can be imported
Inference run-time library
- performance improvement (with conv2D int8) wherever weights location (in internal or external flash)
Code generation
- add new “generate” options to be able to generate a relocatable binary model - bz80560/bz80557
- for the split of the weights, each c-array is now prefixed with the name of the model to facilitate the data placement with IAR tool-chain. “static” keyword has been removed - bz89998/bz90000
- outputs buffers can be allocated in the activations/working buffer - bz89643
Validation process
- add percentage of execution time by layer in the report for the validation of target
- add the computation of the L2r metric systematically
- add name/shape of the output tensor
Validation and System performance applications
- re-work the source files to share the common functions between both applications and to avoid to mix STM32 specific code and the AI code.
- both are now fully based on the Platform Observer API
  - no system heap is no more requested to use the aiValidation application test
- add support for relocatable binary model
- remove warnings for STM32L5 compilation with ARM Compiler 6
Embedded documentation
- add new article for the relocatable binary model support
- add new snippet code for the usage of the Platform Observer API
  - “End-of-process input buffer notification”
  - “Dumping intermediate output”
- fix typo and mismatch with the current release.

Major bug fixes

fix/improve support of TFlite reshape operator before FC operator (dim > 2) - bz93218
add support to align_corners attribute of TFLite ResizeNearestNeighbor operator (TF2.3)
improve the optimization step to remove the redundant conversions (code gen) - bz88996/bz92379
(minor) reinforce the coherency check of the output types before calculation of the metrics
fix ONNX importer for the reshape layer
fix/improve memory allocator for specific corner case - bz88995
(minor) report clear error when the output or workspace option is passed w/o argument - bz89157
fix/improve reported information of the IO model (corner case)
improve the execution time for the conv2d float kernel when the weights are located in the STM32 internal flash
fix a bug in the final computation op for the optimized con2d pool integer kernel
(minor) - fix decimal separator for reported time value - bz90001
improve execution time of the clip nl operator - bz90289
fix overflow calculation issue in the integer operation when scales values are closed to zero: < 10e-7 - bz90287
UI - fix code generation in advanced mode (simplify overall files copy) + fix Makefile generation
UI - fix keil generated project file for L5 (add DSP if not already there)
UI - fix report of the metrics (multiple outputs case) - bz93608
test applications - remove warnings for STM32L5 compilation with ARM Compiler 6

What is new in X-Cube-AI V5.1?

Enhancements of the User Interface
- New graph to display the C-graph (operators and associated tensors)
- New graph to display the usage of the “activations” buffer
C-Code generation / network runtime library
- Adding support to generate a c-array by weight/bias tensor
- Adding textual c-graph description (operators and associated tensors) in the reports
- Lighter network-runtime library support. Granularity of the inference kernel functions are aligned on the used integer scheme to decrease the final code size
- Improvement of the memory peak usage (RAM and ROM size)
- Improvement of the inference time of Convolution kernels (float and integer type) in particular when the weights are placed in a low latency memory device
Embedded inference client API extension
- Adding platform observer API
Validation
- Adding support for int8/uint8 validation files
System performance application
- Adding executing time by layer
Documentation
- Adding specific article for the quantization aspects

Major bug fixes and improvement

adding ONNX Softmax, Hardmax, LogSoftmax, Resize operators
upgrading subset of operators up-to Opset to 10 support of ONNX 1.6
adding batch normalization and mul/add support for integer layers
adding Keras simple RNNs support
adding support for quantized concat operator
removing system heap usage for recurrent layers
fixing support for Keras GRU/LSTM/RNN layers when use_bias==False
fixing support for Keras GRU/LSTM layers with non-default nonlinear function
fixing ONNX shape interpretation (Concat and Slice) for 2D tensors
fixing duplicated inputs for element-wise operators
fixing support for TFLite/ONNX model with multiple IO
fixing MACC calculation for ONNX GEMM layer
improving parsing of TFLite reshape operator
fixing parsing of Keras RepeatVector operator

What is new in X-Cube-AI V5.0.0?

Support of ONNX floating-point network
- A subset of operators from Opset 7, 8 and 9 of ONNX 1.6 is supported
Adding “per-channel” support for quantized model
- Enhanced support for Post-quantized and training-aware TensorFlow lite models
- Keras post-quantization process update
Support of Mutiple IO network
- Model for multiple IO are now fully supported
Bug fixes and enhancement
- Improvement of the RAM usage

What is new in X-Cube-AI V4.1.0?

Support of TensorFlow Lite quantized network
- Support latest TensorFlow Lite quantized networks optimization, both floating point and integer quantized networks can be imported in the tool.
Integer arithmetic for quantized model
- X-Cube-AI has full support for integer quantized networks allowing to optimize flash memory usage and inference time while keeping a good accuracy. Keras post training quantization tool also supports integer quantization.
Support of external memories:
- The generated code can be configured to use external flash for weights or external ram for activations or weights.
- This allows to run bigger networks
- On ST STM32 boards the usage of external memories is fully plug and play

What is new in X-Cube-AI V4.0.0?

Support of TensorFlow Lite in floating point
Support of Keras network quantization
Enhancements of the User Interface
- One click automated validation project build and run on the target
- New graph display
- New Logging window with cancel support that will give the results so far
New command line interface allowing to perform the following actions on a neural network
- analyze to give complexity, needed Flash and Ram size
- generate the C code
- validate on the desktop and on the target
- quantize using the min/max or greedy algorithm

What is new in X-Cube-AI V3.4.0?

Bug fixes and enhancement in the user interface
Support of additional Keras layers:
- DepthwiseConv1D: Depthwise 1D convolution
- SeparableConv1D: Depthwise separable 1D convolution:
Better support of Linux and MacOS

Release Note
ST Edge AI Core Technology 2.2.0

ST logo Information in this document is provided solely in connection with ST products. The contents of this document are subject to change without prior notice. © Copyright STMicroelectronics 2025. All rights reserved. www.st.com