2.2.0
STM32N6 Example projects & tips for creating new projects


ST Edge AI Core

STM32N6 Example projects & tips for creating new projects


for STM32 target, based on ST Edge AI Core Technology 2.2.0



r1.1

Overview

This article describes typical applications which can be used as a started point:

Two additional projects are presented. They can be used with a host application using a specific AiRunner package.

Project source location

Sources of the different projects can be found in $STEDGEAI_CORE_DIR/Projects/STM32N6570-DK/Applications/ folder.

%STEDGEAI_CORE_DIR% indicates the root location where the ST Edge AI Core components are installed.

Hello-world project

This project comes in two flavors: IAR Embedded Workbench® IDE, STM32CubeIDE project.

Overview

This project demonstrates the essential steps to perform an initial inference using the ST Neural-ART NPU. It includes hardware initialization and software setup for running inferences on a deployed model. The project continuously performs inferences, measures the execution time of each inference, and sends the results via UART.

This section describes the main points of attention: setting up the clocks, the NPU subsystem, and the RIF components. Presented here, the resource isolation slave unit for address space protection (RISAF) and the “Resource Isolation Framework Security Controller” (RIFSC).

Clocks

Overview

Clock configuration is to be done in every kind of project. The main guidelines provided here address the basic use-case of using the NPU for doing an inference on the discovery-kit board.

  • Internal clocks
    • The NPU is clocked by sysc_ck. The source of this clock is set by selecting SysClkSource (by setting SYSSWS of RCC_CFGR1)
    • The NPU RAMs (AXISRAM3/4/5/6) are clocked by sysd_ck. The source of this clock is set by selecting SysClkSource (by setting SYSSWS of RCC_CFGR1)
    • Setting SYSSWS to 0b11 allows having different clocks for the NPU and the NPU rams (this configuration is used in the example project)
  • Externalized clocks
    • For STM32N6 discovery-kit, the external ram is connected on the XSPI1 peripheral (the source clock of this peripheral can be set in RCC_CCIPR6), the example project sets its source to ic4
    • For STM32N6 discovery-kit, the external flash is connected on the XSPI2 peripheral (the source clock of this peripheral can be set in RCC_CCIPR6), the example project sets its source to ic3

Clock speeds

Maximum clock speeds (already shown here) are summarized below:

Max clock (MHz) Normal mode Overdrive mode*
CPU 600 800
CPU rams 400 400
CPU AXI interconnect 400 400
NPU 800 1000
NPU rams 800 900
NPU AXI interconnect 800 1000
(on STM32N6 discovery-kit) External RAM 200 200
(on STM32N6 discovery-kit) External flash 200 200

* Refer to the datasheet for more information

The values reported in the table above are frequencies that can be targeted when VDDCORE = 0.8 V (Normal mode), VDDCORE = 0.9 V (Overdrive mode).
Overdrive mode allows to “overclock” the whole system when the power supply is raised a bit (in a tolerable range).

Focus: doing an inference

When doing an inference, using the advised options (LL_ATON_RT_MODE=LL_ATON_RT_ASYNC), the code calls WFE instructions, leading the core to switch to low-power modes. Without further configuring the RCC, this will gate all the clocks that are connected to the NPU and NPU RAMs, thus leading no way for the NPU to complete its work (and the inference will eventually never finish).

As such, it is required to take extra care by configuring correctly the clocks that shall not be gated when going into low-power modes, especially:

  • All internal memories used during the inference must remain clocked in low-power mode
  • All external memories used during the inference must remain clocked in low-power mode (and their associated peripherals, XSPI1, XSPI2)
  • The NPU must remain clocked
  • The NPU cache (CACHEAXI) must remain clocked if used

Setting up the NPU

To properly set up the NPU:

  • Clocks must be set up properly.
  • The NPU must be clocked/reset.
  • (The CACHEAXI must be clocked/reset if needed).
  • Resource isolation framework (RIF) must be configured properly (after clocking the various IPs) by setting RISAF/RIFSC configurations.

CACHEAXI

The AXI cache (CACHEAXI) is placed on the AXI interconnect of the NPU to improve the performance of data traffic. The goal of this cache is to cache data accessed by the NPU in external memories.

The initial configuration of the cache is easy, as it must be only clocked, reset and enabled.

Resource isolation slave unit for address space protection (RISAF)

  • RISAF is part of the RIF and used to protect memory accesses.
  • Each memory resource can be split into small regions with various rights on it.
  • The available number of regions, and the granularity of the ranges depends on the resource.
  • The “access rights” that can be defined for each region. They are based on
    • Secure / Non-Secure attribute of the transaction.
    • Privileged / non-privileged attribute of the transaction.
    • The “compartment ID” (CID) of the transaction (for memories connected to AXI bus).
    • Whether the access is a read or a write.

If RISAFs are not configured, only accesses that are secure, privileged and with CID=1 are allowed.
As such, it is important to configure the RISAFs correctly before attempting to access memories, or the RISAFs will likely filter out some accesses.

Whenever a RISAF filters an access:

  • The RISAF logs the access violation characteristics in its registers (transaction characteristics and address)
  • The RISAF raises an interrupt signal to the IAC (careful: The NVIC does not handle the RISAF interrupt directly, but the IAC can use the RISAF interrupt to raise an IRQ that the NVIC handles)
  • Read accesses blocked by a RISAF return zero
  • Write accesses blocked by a RISAF are not written to target memory

The hello_world project takes a dumb-and-easy decision and configures each RISAF of each memory resource to allow any transaction, by defining two fully overlapping ranges for secure and non-secure requests that span over the full memory range.

For each RISAF, opening the “firewall” to all transactions (careful, this is dangerous!) is done by the function below, with get_risaf_max_addr returning the maximum address of the memory protected by the RISAF in parameter.

static void set_risaf_default(RISAF_TypeDef *risaf)
{
  RISAF_BaseRegionConfig_t risaf_conf;  
  risaf_conf.StartAddress = 0x0;
  risaf_conf.EndAddress   = get_risaf_max_addr(risaf); /* as the default config */
  risaf_conf.Filtering    = RISAF_FILTER_ENABLE; // Base region enable (otherwise access control is secure, privileged, trusted domain CID = 1)
  risaf_conf.PrivWhitelist  = RIF_CID_NONE; // apps running in all compartments can access to region in priv/unpriv mode
  risaf_conf.ReadWhitelist  = RIF_CID_MASK; // apps running in all compartments can R in this region
  risaf_conf.WriteWhitelist = RIF_CID_MASK; // apps running in all compartments can W in this region
  // Configure 2 regions with this config, fully overlapping, one for secure one for non secure accesses:
  risaf_conf.Secure = RIF_ATTRIBUTE_SEC;    // Only secure requests can access this region
  HAL_RIF_RISAF_ConfigBaseRegion(risaf, 0, &risaf_conf);
  risaf_conf.Secure = RIF_ATTRIBUTE_NSEC;    // Only non-secure requests can access this region
  HAL_RIF_RISAF_ConfigBaseRegion(risaf, 1, &risaf_conf);
}

Resource isolation framework security controller (RIFSC)

Most of the peripherals do not understand the notions of the Resource Isolation Framework: they are not “RIF-aware”.

RIFSC centralizes configuration of small blocks that are placed in front of all such peripherals to make them “RIF-compatible”.
- RISC registers configure the logic placed in front of bus slaves to filter accesses to peripherals registers.
- RIMC register configures the logic placed in front of bus masters to assign them secure/privileged/compartment ID attributes on the AXI bus.

After a system reset, all non-RIF-aware peripherals become configurable by non-secure and unprivileged, reachable by any CID. After a system reset, all non-RIF-aware bus masters become non-secure and unprivileged with CID = 0.

In the case of the "hello_world" project, configure only the NPU using RISC/RIMC registers because it is the only peripheral used. The NPU is a master on the AXI bus, but also a slave, so both “firewalls” must be configured.
The configuration is done in the NPU_Config function, by the following snippet of code

  RIMC_MasterConfig_t master_conf;
  master_conf.MasterCID = RIF_CID_1;
  master_conf.SecPriv = RIF_ATTRIBUTE_SEC | RIF_ATTRIBUTE_PRIV; // Priviledged secure
  HAL_RIF_RIMC_ConfigMasterAttributes(RIF_MASTER_INDEX_NPU, &master_conf);  
  HAL_RIF_RISC_SetSlaveSecureAttributes(RIF_RISC_PERIPH_INDEX_NPU, RIF_ATTRIBUTE_PRIV | RIF_ATTRIBUTE_SEC);

Here, the NPU “master” sends transactions with compartment ID 1 with secure and privileged attributes. (this effectively makes the NPU requests go through unconfigured RISAF for example, but it is recommended for security purposes to use another CID for the NPU).
The NPU “slave” allows only secure and privileged accesses to the NPU (that is, a core in non-secure mode is not able to write into NPU configuration registers).

NOTE: It is to be noted that the secure attribute configured in the NPU RIMC is effective if and only if the NPU RISC allows configuration of the NPU only from the secure-world. This is the secure guard.

Illegal Access controller (IAC)

All detections of RIF-related problematic accesses spawned from above configurations and illegal accesses are centralized in the Illegal Access controller (IAC).

Illegal accesses because of the RIF do not produce NVIC-mapped IRQ if the IAC is not clocked/enabled: as such it is generally a good practice to activate it. This is useful for debugging or recover from illegal accesses emanating from the RIF.

Each source of illegal access can be set to raise an IAC IRQ or can be masked. The handler can then be used to retrieve the peripheral that did the illegal access (and to which address it was done).
Details of the illegal access can be retrieved from the RISAF register that raised an interrupt on the IAC.

Secure / Non-Secure project - TrustZone® configuration

This project comes in two flavors: IAR Embedded Workbench® IDE, STM32CubeIDE project.

Overview

An example project showing an inference performed in a TrustZone® environment is also provided.

This project has nothing peculiar in terms of NPU usage.
Obviously, the obstacles for the developer here are that the various use-cases must be well defined to partition correctly various resources and memories using TrustZone® tools (SAU/IDAU, MPU) and the resource isolation framework.

In the provided example, the MCU boots into the TrustZone® secure-world (thanks to a first-stage-bootloader - fsbl).
The secure firmware takes care of configuring all the peripherals needed, and prohibits access to most of the peripherals to the non-secure firmware: the non-secure firmware “duty” is only to perform inferences, so it only needs access to memories, NPU, … The secure firmware also configures the RIF to allowlist non-secure accesses to memories needed by the inference.

The secure firmware then hands over to the non-secure firmware that performs inferences.

Preliminary note: how to compile/execute?

As in every TrustZone® project, the compilation process is a bit complex: secure and non-secure firmwares are generated as two images (so that it is possible to update the non-secure part of the firmware only for example). But if the non-secure image needs to access functions provided by the secure firmware, it cannot call them directly (a non-secure code calling a function in a “secure” part of the memory generates a secure fault): the secure firmware must provide a wrapper with a secure gate in it to explicitly allow such a call; such wrapper functions shall be placed in a “non-secure callable” section of the memory to be called by the non-secure world without generating faults.

As such, it is mandatory to compile the “secure” project first, export the non-secure-callable section callable functions symbols for the non-secure firmware, and compile the non-secure firmware.
(If the secure project ever needs to access non-secure project entry points, then it will also be mandatory to recompile the secure project after exporting non-secure symbols)

Security considerations / TrustZone® on Armv8-M

This section presents very quickly attribution units and MPU of the Armv8-M.

SAU/IDAU

Two attribution units are present in the STM32N6 (as all Armv8-M with security extensions):

  • Implementation-Defined Attribution Unit (IDAU): Set at system level by STMicroelectronics, not programmable.
  • Security Attribution Unit (SAU): Programmable in a secure state.

These two attribution units allow partitioning the whole memory addressable range. Regions can be defined as Secure, Non-Secure, Non-Secure callable or exempt from security checking.

The final memory security attribution is the stricter security attribution by the IDAU or the SAU. (that is, if an address is said to be secure by the IDAU, configuring the SAU on this address as non-secure makes it non-secure.)

As a reminder, the IDAU assignment on the STM32N6 is as follows (the 256MB ranges at 0x00000000, 0x20000000, 0x40000000 (Non-Secure) are aliased at 0x10000000, 0x30000000, 0x50000000 (Secure))

Address region IDAU assignment Aliased targets
0xE------- NS (non-secure) CM55 internal peripherals
0xD------- NS (non-secure) SDRAM2
0xC------- NS (non-secure) SDRAM1
0x9------- NS (non-secure) XSPI1
0x8------- NS (non-secure) XSPI3
0x7------- NS (non-secure) XSPI2
0x6------- NS (non-secure) FMC NOR/SRAM
0x5------- NSC (secure) Peripherals
0x4------- NS (non-secure) Peripherals
0x3------- NSC (secure) Internal RAMs, DTCM
0x2------- NS (non-secure) Internal RAMs, DTCM
0x1------- NSC (secure) BootROM, ITCM
0x0------- NS (non-secure) BootROM, ITCM

SAU can then be configured to further partition the non-secure ranges above into Secure/NSC regions, and partition all the memory (external memories) that can be mapped from 0x60000000 upwards.

Important

- If left unconfigured, and disabled, by default (depending on the ALLNS bit), the SAU applies a “secure-everywhere” policy, thus preventing any access from a Non-Secure security state.
- When activated, any range of memory not configured by the SAU is marked Secure.

In the context of the project delivered, SAU configuration is done through partition_stm32n657xx.h that describes all the ranges for the SAU and their attributes. The function TZ_SAU_Setup is called in the early stages of the firmware, by SystemInit.

This file also configures an important characteristic of all IRQs: it defines whether an interrupt target security state is Secure or Non-Secure (for example, even if executing from the non-secure mode, if an IRQ is defined as secure, it triggers the Secure ISR).

After configuring the SAU,

CPU State Execute from Secure memory Execute from Non-Secure memory Fetch data in Secure memory Fetch data in Non-Secure memory
Secure X Secure Fault X X
Non-Secure Secure Fault X Secure Fault X

A call from a Non-Secure state to a Non-Secure-Callable (NSC) address not raises a Secure Fault.

Memory protection unit

The Armv8-M MPU can define memory ranges access permissions and define attributes for each of such range.
The MPU gives a complementary protection over IDAU/SAU by checking other characteristics of transactions (R/W/X, privileged/unprivileged), it also allows to force some constraints on the transactions done to a given address in memory or set the associated cacheability attributes.

For each range defined in the MPU configuration, it is required to define:

  • Base address and limit address
  • Access permission: read or read/write
  • Access permission: privileged mode only / privileged and unprivileged
  • Execute permission: execution permitted / execute never (XN)
  • Shareability: Generally not used with Cortex®-M (useful for multiprocessor systems)
  • Memory type:
    • Normal: with accesses that have no side-effects, that can be reordered on the AXI bus. Normal memories are further described by:
      • Cacheability: Are the accesses cacheable or not. Cacheability is further defined by attributes for both inner (for integrated caches) and outer-caches attributes:
        • Cache policy: Write-through or write-back
        • Allocation: Allocate/Don’t allocate for R/W accesses
        • Transient hint: hints to the cache that the data may only be needed temporarily
    • Device: with accesses that have side-effects. Among others: this type of memory is never cacheable, and accesses must be aligned. Device memories are further described by attributes:
      • Gathering/non-Gathering (G/nG): multiple accesses of the same type (R/W) can be merged (or not)
      • Reordering/non-Reordering (R/nR): transactions can be reordered or not (in nR regions, transactions must occur in order)
      • Early Write Ack or non-Early Write Ack (E/nE): nE attributes recommend that only the very endpoint acknowledges transactions (the transaction is seemingly slower, but the acknowledge is done by the receiver of the transaction).

The MPU can be fully disabled (no actions from it, and the memory attributes are set based on the default Armv8-M memory map).
Otherwise, after configuring it and enabling it, MemManage Faults will be raised if illegal accesses are done and catched by the MPU.

Configurations done in the project

FSBL

  • The MPU is configured:
    • Attributes configuration 0 is set to “Normal” memory with Write-Back/Read-Allocate/Non-transient hint policies
    • The address range 0x70000000-0x77FFFFFF:
      • Uses attributes 0
      • Can be used to execute code (the Secure/Non-Secure firmware can be executed from flash)
      • Only allows Read-Only accesses
      • Allows privileged or non-privileged accesses
  • Clock configuration is done in the FSBL
  • XSPI2 controller is set up to access external flash memory
  • The FSBL then jumps to the Secure code.

Secure firmware

The secure firmware handles all the configurations that must be done on the secure side (except MPU, already done in the FSBL):

  • The SAU is configured by TZ_SAU_Setup:
    • Region 0 [NSC] contains all the non-secure callable functions (memory range defined at link time)
    • Region 1 [NS] 0x20000000 - 0x2FFFFFFF
    • Region 2 [NS] 0x40000000 - 0x4FFFFFFF
    • Region 3 [NS] 0x70180000 - 0x7FFFFFFF
    • All other regions are not configured, and are then, secure regions.
  • Power-management configurations: internal memories are powered-up, sleep mode gating is set up.
  • Memory-mapping of the external memories is done
  • The NPU is set up
  • The NPU cache (CACHEAXI) is set up
  • The NPU RIMC configuration is done: the NPU master is configured as “non-secure, privileged” – The NPU generates non-secure accesses on the bus
  • The NPU RISC configuration is done: the NPU slave is configured as “non-secure, privileged” – The NPU can be configured in non-secure mode
  • RISAF is configured:
    • Most of the memories are accessible in both secure and non-secure modes
    • External flash memory is configured with
      • Addresses from 0x70100000 to 0x7017FFFF as secure (the secure firmware is made to be flashed here)
      • Addresses from 0x70180000 onwards as non-secure (the non-secure firmware and the network parameters are to be stored there)
  • GPIOs are configured: RISC and pin attributes are set
    • The two LEDs RISC-related registers are configured to make the pins configurable in non-secure mode
    • The two LEDs pins attributes are set to non-secure.
  • Interrupts are set up to allow Invalid Accesses violations debuggable
    • IAC is set up and the IAC IRQ is enabled
    • Busfault/Securefault handlers are activated
  • Then the secure firmware jumps to the non-secure firmware.

Inference done on the NS side

The Non-Secure software is as usual for using the NPU for doing an inference.

Some secure-interrupts-related-callbacks are registered (by using the NSC region) from the non-secure side on the secure-side: as a contract, the secure side offers ways to connect callbacks whenever security IRQ pops-up.
From the NS side, if an access violation occurs, the IRQ is handled by the secure firmware, the secure interrupt handler then jumps to the callback (in the NS firmware).

Memory pool configuration

Special attention has to be taken when crafting the memory pool file for the ST Neural-ART compiler in such configurations.

When doing an inference, the NPU will issue transactions on the AXI bus that will have the attributes configured by the RIMC in front of it. If the NPU RIMC is configured to tag transactions as non-secure, then for example, when fetching data from the NPU, the target address access should be granted to non-secure (according to SAU, RISAF, etc).

When doing an inference, some epochs may sometimes not be handled directly by the NPU (SW epochs) and is executed by the MCU (through EmbedNets lib). This time, the MCU accesses memories directly. If the inference is done in non-secure mode, the transactions will also be tagged as non-secure data fetches/writes.

To ease the use-case of doing inferences in Non-Secure:

  • SAU should be configured to fit the IDAU NS regions for memories (at least AXISRAMs + NPU RAMS, at 0x2-------)
  • RISAF should be configured to allow Non-Secure requests to those memory ranges
  • The memory pool file should make reference to only Non-Secure memory regions configured in the SAU / RISAF (this means, for example, that any address starting by 0x3-------) should be banned from a memory pool crafted for non-secure inference)

NPU Validation

This project comes in two flavors: IAR Embedded Workbench® IDE, makefile (based on gcc) project.

Overview

This project is extensively used in other articles related to ST Neural-ART NPU.

The only goal of this project is to be responsive to UART commands that can be sent by the AiRunner package.

It is mainly able to

  • Report information about the Neural-ART-compiled-network it contains
  • Feed data received from the UART to the network
  • Run an instrumented inference of the network (providing a detail of the duration taken for each step of each epoch of the inference)
  • Send back output data over UART

How to use

To tune the project, it is possible to edit the Core/Inc/app_config.h file. This allows to set various settings for the project:

  • USE_MCU_DCACHE allows to use CM55 data cache (useful to speed-up data accesses from the CPU). Should be kept to 1.
  • USE_MCU_ICACHE allows to use CM55 instruction-cache (useful to speed-up instruction fetching). Should be kept to 1.
  • USE_EXTERNAL_MEMORY_DEVICES allows to use external RAM & flash chips on the discovery-kit board. Setting this to 0 not initializes external memories & XSPIs connected to it, thus being more power-efficient, but greatly decreasing storage capability.
  • USE_OVERDRIVE allows to “boost” capabilities of the MCU by increasing power supply voltage and overclocking it.
  • NO_OVD_CLK400 can be used to force all clocks at 400MHz (CPU/NIC/NOC/NPU) when no overdrive is set (this can be useful to debug clocks, or to reproduce the “same” clocking scheme on slower MCUs).
  • USE_UART_BAUDRATE is set to 921600 bps because a lot of data must be transmitted over UART when doing a full validation. It is possible to decrease this value but this is not recommended.

CM55 Validation

Foreword

This project is a clone of the NPU Validation project, but dedicated for testing the performance of pure software implementation of the deployed model (i.e. not using the NPU). The ST Edge AI Core is used to generate the specialized files running only on the Arm® Cortex® M55 core. Only the default software stack is considered, the ST Neural ART stack is not involved.

Like the NPU Validation project, this project comes in two flavors: IAR Embedded Workbench® IDE and makefile (based on gcc) projects. It includes an associated script that provides a simple way to perform validations on STM32N6, similar to what can be done on other STM32 targets with embedded flash memory. Usage is similar to the previous project. Below is a quick start guide.

Limitations

  • Weights are necessarly stored in the external flash. No configuration is provided allowing to store the weights in the internal RAMs.
  • Only the legacy c-api is supported. The --c-api st-ai option can be not used.

Quickstart

As presented in the NPU getting started, the main flow here is similar. This part will show differences with the NPU-enabled flow to evaluate a model without NPU.

Generate

The generate command shall not be called with --st-neural-art and should be called with the following options:

$> stedgeai generate -m <model-file> --target stm32n6 --binary --address 0x71000000 --memory-pool ./mypool_N6.json
  • --binary is used to generate the weights in a binary file. It works together with the --address argument which provides the base address requested by the user to store weights. Note that the full flow currently works correctly for the weights stored in external memory (address superior to 0x7000’0000)

  • --memory-pool specifies the addresses and ranges of memory available to store activations of the C-implementation of the network. An example of such a memory_pool file (Not to be confused with the .mpool files used by the Neural-Art compiler !!!) is provided in:

    $STEDGEAI_CORE_DIR/Projects/STM32N6570-DK/Applications/CM55_Validation/mypool_N6.json

Warning

The --memory-pool option is mandatory to define the possible memory locations that can be used to store the activations. Internal RAM (close to the MCU) will be used fisrt, followed by the NPU RAMs and then the external RAM (see contains of the mypool_N6.json file).

This command generates various files (similar to generation for STM32 chips without NPU) located in the st_ai_output folder.

Flash weights, compile the validation project

As for the “standard” NPU flow for the N6, deploying the outputs of the generate command can be done easily through a script: cm55_loader.py

This script works with a very similar configuration as the n6_loader script presented here:

  • config.json, completely similar to the n6_loader one: Use the same file
  • config_cm55.json, similar to the n6_loader one, the only difference being that the “project” to target should be the CM55 Validation one.

Once those two config files are completed, it is possible to call the script to do all the steps required to perform a valiation on target afterwards:

  • Copy useful files to the project
  • Flash the weights in external flash
  • Compile the project, and run it.

For example:

$ python cm55_loader.py --cm55-loader-config config_cm55l.json
06/20/2025 10:54:41 AM  __main__ -- Preparing compiler GCC
06/20/2025 10:54:41 AM  __main__ -- Setting a breakpoint in main.c at line 119 (before the infinite loop)
06/20/2025 10:54:41 AM  __main__ -- Copying network.c to project: ~/stedgeai/my_output/network.c -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network.h to project: ~/stedgeai/my_output/network.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_c_info.json to project: ~/stedgeai/my_output/network_c_info.json -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_config.h to project: ~/stedgeai/my_output/network_config.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_generate_report.txt to project: ~/stedgeai/my_output/network_generate_report.txt -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_data.c to project: ~/stedgeai/my_output/network_data.c -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_data.h to project: ~/stedgeai/my_output/network_data.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_data_params.c to project: ~/stedgeai/my_output/network_data_params.c -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_data_params.h to project: ~/stedgeai/my_output/network_data_params.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Copying network_data.bin to project: ~/stedgeai/my_output/network_data.bin -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM  __main__ -- Extracting weights location from c-files
06/20/2025 10:54:41 AM  __main__ --     Weights location: 0x71000000
06/20/2025 10:54:41 AM  __main__ -- Resetting the board...
06/20/2025 10:54:41 AM  __main__ -- Skipping flashing memory network_data.bin -- 10.320 kB
06/20/2025 10:54:41 AM  __main__ -- Building project (conf= N6-DK)
06/20/2025 10:55:08 AM  __main__ -- Running the program
06/20/2025 10:55:10 AM  __main__ -- Start operation achieved successfully

The validation firmware is now running, it is possible to validate the implementation.

Validate

The commandline to validate is straightforward, do not forget to pass the memory-pool argument (or expect coherency checks errors):

$> stedgeai validate -m st_mnist_v1_28_tfs_int8.tflite --target stm32n6 --mode target --desc serial:921600 --memory-pool ./mypool_N6.json
   ST.AI Profiling results v2.0 - "network"
  ---------------------------------------------------------------
  nb sample(s)      :   1                                        
  duration          :   0.986 ms by sample (0.986/0.986/0.000)   
  macc              :   1081896                                  
  cycles/MACC       :   0.73                                     
  CPU cycles        :   [788,872]                                
  used stack/heap   :   not monitored/0 bytes                    
  ---------------------------------------------------------------
   
   Inference time per node
   -----------------------------------------------------------------------------------------
   c_id    m_id   type                dur (ms)       %    cumul  CPU cycles    name         
   -----------------------------------------------------------------------------------------
   0       0      NL (0x107)             0.003    0.3%     0.3%  [   2,691 ]   ai_node_0    
   1       1      Conv2D (0x103)         0.137   13.9%    14.3%  [ 109,922 ]   ai_node_1    
   2       2      Pad (0x116)            0.006    0.6%    14.8%  [   4,403 ]   ai_node_2    
   3       2      Conv2D (0x103)         0.152   15.4%    30.2%  [ 121,354 ]   ai_node_3    
   4       3      Pad (0x116)            0.005    0.5%    30.7%  [   3,935 ]   ai_node_4    
   5       3      Conv2D (0x103)         0.432   43.8%    74.5%  [ 345,655 ]   ai_node_5    
   6       4      Pad (0x116)            0.007    0.7%    75.3%  [   5,667 ]   ai_node_6    
   7       4      Conv2D (0x103)         0.069    7.0%    82.3%  [  55,228 ]   ai_node_7    
   8       5      Conv2D (0x103)         0.110   11.2%    93.4%  [  88,127 ]   ai_node_8    
   9       6      Pool (0x10b)           0.033    3.4%    96.8%  [  26,608 ]   ai_node_9    
   10      7      Dense (0x104)          0.023    2.3%    99.1%  [  18,087 ]   ai_node_10   
   11      8      Softmax (0x10c)        0.008    0.8%    99.9%  [   6,399 ]   ai_node_11   
   12      9      NL (0x107)             0.001    0.1%   100.0%  [     796 ]   ai_node_12   
   -----------------------------------------------------------------------------------------
   total                                 0.986                   [ 788,872 ]                
   -----------------------------------------------------------------------------------------
   
   Statistic per tensor
   --------------------------------------------------------------------------------
   tensor   #   type[shape]:size        min     max      mean      std  name       
   --------------------------------------------------------------------------------
   I.0      1   u8[1,28,28,1]:784         1     255   126.273   74.791  input_1    
   O.0      1   f32[1,1,1,36]:144     0.000   0.957     0.028    0.157  output_1   
   --------------------------------------------------------------------------------

As it can be seen from the Inference time per node table, the layers are not implemented using NPU facilities, but a standard software implementation is done for the model.