STM32N6 Example projects & tips for creating new projects
for STM32 target, based on ST Edge AI Core Technology 2.2.0
r1.1
Overview
This article describes typical applications which can be used as a started point:
- “Hello World” - This
project is a minimal project to exhibit the main steps in getting a
first inference. It focuses on the required hardware settings to
enable the ST
Neural-ART NPU.
- “Secure-Non-Secure project - TrustZone® configuration” - This advanced project allows executing of the ST Neural-ART NPU in non-secure mode.
Two additional projects are presented. They can be used with a
host application using a specific AiRunner
package.
- “NPU_Validation” is a generic test application to evaluate a deployed model using NPU.
- “CM55_Validation” is a generic test application to evaluate a deployed model using Cortex-M55 only.
Project source location
Sources of the different projects can be found in
$STEDGEAI_CORE_DIR/Projects/STM32N6570-DK/Applications/
folder.
%STEDGEAI_CORE_DIR%
indicates the root location where the ST Edge AI Core components are installed.
Hello-world project
This project comes in two flavors: IAR Embedded Workbench® IDE, STM32CubeIDE project.
Overview
This project demonstrates the essential steps to perform an initial inference using the ST Neural-ART NPU. It includes hardware initialization and software setup for running inferences on a deployed model. The project continuously performs inferences, measures the execution time of each inference, and sends the results via UART.
This section describes the main points of attention: setting up the clocks, the NPU subsystem, and the RIF components. Presented here, the resource isolation slave unit for address space protection (RISAF) and the “Resource Isolation Framework Security Controller” (RIFSC).
Clocks
Overview
Clock configuration is to be done in every kind of project. The main guidelines provided here address the basic use-case of using the NPU for doing an inference on the discovery-kit board.
- Internal clocks
- The NPU is clocked by
sysc_ck
. The source of this clock is set by selecting SysClkSource (by settingSYSSWS
ofRCC_CFGR1
) - The NPU RAMs (AXISRAM3/4/5/6) are clocked by
sysd_ck
. The source of this clock is set by selecting SysClkSource (by settingSYSSWS
ofRCC_CFGR1
) - Setting
SYSSWS
to0b11
allows having different clocks for the NPU and the NPU rams (this configuration is used in the example project)
- The NPU is clocked by
- Externalized clocks
- For STM32N6 discovery-kit, the external ram is connected on the
XSPI1 peripheral (the source clock of this peripheral can be set in
RCC_CCIPR6
), the example project sets its source to ic4 - For STM32N6 discovery-kit, the external flash is connected on
the XSPI2 peripheral (the source clock of this peripheral can be set
in
RCC_CCIPR6
), the example project sets its source to ic3
- For STM32N6 discovery-kit, the external ram is connected on the
XSPI1 peripheral (the source clock of this peripheral can be set in
Clock speeds
Maximum clock speeds (already shown here) are summarized below:
Max clock (MHz) | Normal mode | Overdrive mode* |
---|---|---|
CPU | 600 | 800 |
CPU rams | 400 | 400 |
CPU AXI interconnect | 400 | 400 |
NPU | 800 | 1000 |
NPU rams | 800 | 900 |
NPU AXI interconnect | 800 | 1000 |
(on STM32N6 discovery-kit) External RAM | 200 | 200 |
(on STM32N6 discovery-kit) External flash | 200 | 200 |
* Refer to the datasheet for more information
The values reported in the table above are frequencies that can
be targeted when VDDCORE = 0.8 V
(Normal mode), VDDCORE = 0.9 V
(Overdrive mode).
Overdrive mode allows to “overclock” the whole system when the power
supply is raised a bit (in a tolerable range).
Focus: doing an inference
When doing an inference, using the advised options
(LL_ATON_RT_MODE=LL_ATON_RT_ASYNC
), the code calls
WFE
instructions, leading the core to switch to
low-power modes. Without further configuring the RCC, this will gate
all the clocks that are connected to the NPU and NPU RAMs, thus
leading no way for the NPU to complete its work (and the inference
will eventually never finish).
As such, it is required to take extra care by configuring correctly the clocks that shall not be gated when going into low-power modes, especially:
- All internal memories used during the inference must remain clocked in low-power mode
- All external memories used during the inference must remain clocked in low-power mode (and their associated peripherals, XSPI1, XSPI2)
- The NPU must remain clocked
- The NPU cache (CACHEAXI) must remain clocked if used
Setting up the NPU
To properly set up the NPU:
- Clocks must be set up properly.
- The NPU must be clocked/reset.
- (The CACHEAXI must be clocked/reset if needed).
- Resource isolation framework (RIF) must be configured properly (after clocking the various IPs) by setting RISAF/RIFSC configurations.
CACHEAXI
The AXI cache (CACHEAXI) is placed on the AXI interconnect of the NPU to improve the performance of data traffic. The goal of this cache is to cache data accessed by the NPU in external memories.
The initial configuration of the cache is easy, as it must be only clocked, reset and enabled.
Resource isolation slave unit for address space protection (RISAF)
- RISAF is part of the RIF and used to protect memory
accesses.
- Each memory resource can be split into small regions with
various rights on it.
- The available number of regions, and the granularity of the
ranges depends on the resource.
- The “access rights” that can be defined for each region. They
are based on
- Secure / Non-Secure attribute of the transaction.
- Privileged / non-privileged attribute of the transaction.
- The “compartment ID” (CID) of the transaction (for memories connected to AXI bus).
- Whether the access is a read or a write.
If RISAFs are not configured, only accesses that are secure,
privileged and with CID=1 are allowed.
As such, it is important to configure the RISAFs correctly before
attempting to access memories, or the RISAFs will likely filter out
some accesses.
Whenever a RISAF filters an access:
- The RISAF logs the access violation characteristics in its registers (transaction characteristics and address)
- The RISAF raises an interrupt signal to the IAC (careful: The NVIC does not handle the RISAF interrupt directly, but the IAC can use the RISAF interrupt to raise an IRQ that the NVIC handles)
- Read accesses blocked by a RISAF return zero
- Write accesses blocked by a RISAF are not written to target memory
The hello_world
project takes a dumb-and-easy
decision and configures each RISAF of each memory resource to allow
any transaction, by defining two fully overlapping ranges for secure
and non-secure requests that span over the full memory range.
For each RISAF, opening the “firewall” to all transactions
(careful, this is dangerous!) is done by the function below, with
get_risaf_max_addr
returning the maximum address of the
memory protected by the RISAF in parameter.
static void set_risaf_default(RISAF_TypeDef *risaf)
{
;
RISAF_BaseRegionConfig_t risaf_conf.StartAddress = 0x0;
risaf_conf.EndAddress = get_risaf_max_addr(risaf); /* as the default config */
risaf_conf.Filtering = RISAF_FILTER_ENABLE; // Base region enable (otherwise access control is secure, privileged, trusted domain CID = 1)
risaf_conf.PrivWhitelist = RIF_CID_NONE; // apps running in all compartments can access to region in priv/unpriv mode
risaf_conf.ReadWhitelist = RIF_CID_MASK; // apps running in all compartments can R in this region
risaf_conf.WriteWhitelist = RIF_CID_MASK; // apps running in all compartments can W in this region
risaf_conf// Configure 2 regions with this config, fully overlapping, one for secure one for non secure accesses:
.Secure = RIF_ATTRIBUTE_SEC; // Only secure requests can access this region
risaf_conf(risaf, 0, &risaf_conf);
HAL_RIF_RISAF_ConfigBaseRegion.Secure = RIF_ATTRIBUTE_NSEC; // Only non-secure requests can access this region
risaf_conf(risaf, 1, &risaf_conf);
HAL_RIF_RISAF_ConfigBaseRegion}
Resource isolation framework security controller (RIFSC)
Most of the peripherals do not understand the notions of the Resource Isolation Framework: they are not “RIF-aware”.
RIFSC centralizes configuration of small blocks that are placed
in front of all such peripherals to make them
“RIF-compatible”.
- RISC registers configure the logic placed in
front of bus slaves to filter accesses to
peripherals registers.
- RIMC register configures the logic placed in
front of bus masters to assign them
secure/privileged/compartment ID attributes on the AXI bus.
After a system reset, all non-RIF-aware peripherals become configurable by non-secure and unprivileged, reachable by any CID. After a system reset, all non-RIF-aware bus masters become non-secure and unprivileged with CID = 0.
In the case of the "hello_world"
project, configure
only the NPU using RISC/RIMC registers because it is the only
peripheral used. The NPU is a master on the AXI bus, but also a
slave, so both “firewalls” must be configured.
The configuration is done in the NPU_Config
function,
by the following snippet of code
;
RIMC_MasterConfig_t master_conf.MasterCID = RIF_CID_1;
master_conf.SecPriv = RIF_ATTRIBUTE_SEC | RIF_ATTRIBUTE_PRIV; // Priviledged secure
master_conf(RIF_MASTER_INDEX_NPU, &master_conf);
HAL_RIF_RIMC_ConfigMasterAttributes(RIF_RISC_PERIPH_INDEX_NPU, RIF_ATTRIBUTE_PRIV | RIF_ATTRIBUTE_SEC); HAL_RIF_RISC_SetSlaveSecureAttributes
Here, the NPU “master” sends transactions with compartment ID 1
with secure and privileged attributes. (this effectively makes the
NPU requests go through unconfigured RISAF for example, but it is
recommended for security purposes to use another CID for the
NPU).
The NPU “slave” allows only secure and privileged accesses to the
NPU (that is, a core in non-secure mode is not able to
write into NPU configuration registers).
NOTE: It is to be noted that the secure attribute configured in the NPU RIMC is effective if and only if the NPU RISC allows configuration of the NPU only from the secure-world. This is the secure guard.
Illegal Access controller (IAC)
All detections of RIF-related problematic accesses spawned from above configurations and illegal accesses are centralized in the Illegal Access controller (IAC).
Illegal accesses because of the RIF do not produce NVIC-mapped IRQ if the IAC is not clocked/enabled: as such it is generally a good practice to activate it. This is useful for debugging or recover from illegal accesses emanating from the RIF.
Each source of illegal access can be set to raise an IAC IRQ or
can be masked. The handler can then be used to retrieve the
peripheral that did the illegal access (and to which address it was
done).
Details of the illegal access can be retrieved from the RISAF
register that raised an interrupt on the IAC.
Secure / Non-Secure project - TrustZone® configuration
This project comes in two flavors: IAR Embedded Workbench® IDE, STM32CubeIDE project.
Overview
An example project showing an inference performed in a TrustZone® environment is also provided.
This project has nothing peculiar in terms of NPU usage.
Obviously, the obstacles for the developer here are that the various
use-cases must be well defined to partition correctly various
resources and memories using TrustZone® tools (SAU/IDAU, MPU) and
the resource isolation framework.
In the provided example, the MCU boots into the TrustZone®
secure-world (thanks to a first-stage-bootloader -
fsbl).
The secure firmware takes care of configuring all the peripherals
needed, and prohibits access to most of the peripherals to the
non-secure firmware: the non-secure firmware “duty” is only to
perform inferences, so it only needs access to memories, NPU, … The
secure firmware also configures the RIF to allowlist non-secure
accesses to memories needed by the inference.
The secure firmware then hands over to the non-secure firmware that performs inferences.
Preliminary note: how to compile/execute?
As in every TrustZone® project, the compilation process is a bit complex: secure and non-secure firmwares are generated as two images (so that it is possible to update the non-secure part of the firmware only for example). But if the non-secure image needs to access functions provided by the secure firmware, it cannot call them directly (a non-secure code calling a function in a “secure” part of the memory generates a secure fault): the secure firmware must provide a wrapper with a secure gate in it to explicitly allow such a call; such wrapper functions shall be placed in a “non-secure callable” section of the memory to be called by the non-secure world without generating faults.
As such, it is mandatory to compile the “secure” project
first, export the non-secure-callable section callable functions
symbols for the non-secure firmware, and compile the non-secure
firmware.
(If the secure project ever needs to access non-secure project entry
points, then it will also be mandatory to recompile the secure
project after exporting non-secure symbols)
Security considerations / TrustZone® on Armv8-M
This section presents very quickly attribution units and MPU of the Armv8-M.
SAU/IDAU
Two attribution units are present in the STM32N6 (as all Armv8-M with security extensions):
- Implementation-Defined Attribution Unit (IDAU): Set at system level by STMicroelectronics, not programmable.
- Security Attribution Unit (SAU): Programmable in a secure state.
These two attribution units allow partitioning the whole memory addressable range. Regions can be defined as Secure, Non-Secure, Non-Secure callable or exempt from security checking.
The final memory security attribution is the stricter security attribution by the IDAU or the SAU. (that is, if an address is said to be secure by the IDAU, configuring the SAU on this address as non-secure makes it non-secure.)
As a reminder, the IDAU assignment on the STM32N6 is as follows (the 256MB ranges at 0x00000000, 0x20000000, 0x40000000 (Non-Secure) are aliased at 0x10000000, 0x30000000, 0x50000000 (Secure))
Address region | IDAU assignment | Aliased targets |
---|---|---|
0xE------- |
NS (non-secure) | CM55 internal peripherals |
0xD------- |
NS (non-secure) | SDRAM2 |
0xC------- |
NS (non-secure) | SDRAM1 |
0x9------- |
NS (non-secure) | XSPI1 |
0x8------- |
NS (non-secure) | XSPI3 |
0x7------- |
NS (non-secure) | XSPI2 |
0x6------- |
NS (non-secure) | FMC NOR/SRAM |
0x5------- |
NSC (secure) | Peripherals |
0x4------- |
NS (non-secure) | Peripherals |
0x3------- |
NSC (secure) | Internal RAMs, DTCM |
0x2------- |
NS (non-secure) | Internal RAMs, DTCM |
0x1------- |
NSC (secure) | BootROM, ITCM |
0x0------- |
NS (non-secure) | BootROM, ITCM |
SAU can then be configured to further partition the non-secure ranges above into Secure/NSC regions, and partition all the memory (external memories) that can be mapped from 0x60000000 upwards.
Important
- If left unconfigured, and disabled, by default (depending on
the ALLNS
bit), the SAU applies a “secure-everywhere”
policy, thus preventing any access from a Non-Secure security
state.
- When activated, any range of memory not configured by the SAU is
marked Secure.
In the context of the project delivered, SAU configuration is
done through partition_stm32n657xx.h
that describes all
the ranges for the SAU and their attributes. The function
TZ_SAU_Setup
is called in the early stages of the
firmware, by SystemInit
.
This file also configures an important characteristic of all IRQs: it defines whether an interrupt target security state is Secure or Non-Secure (for example, even if executing from the non-secure mode, if an IRQ is defined as secure, it triggers the Secure ISR).
After configuring the SAU,
CPU State | Execute from Secure memory | Execute from Non-Secure memory | Fetch data in Secure memory | Fetch data in Non-Secure memory |
---|---|---|---|---|
Secure | X | Secure Fault | X | X |
Non-Secure | Secure Fault | X | Secure Fault | X |
A call from a Non-Secure state to a Non-Secure-Callable (NSC) address not raises a Secure Fault.
Memory protection unit
The Armv8-M MPU can define memory ranges access permissions and
define attributes for each of such range.
The MPU gives a complementary protection over IDAU/SAU by checking
other characteristics of transactions (R/W/X,
privileged/unprivileged), it also allows to force some constraints
on the transactions done to a given address in memory or set the
associated cacheability attributes.
For each range defined in the MPU configuration, it is required to define:
- Base address and limit address
- Access permission: read or read/write
- Access permission: privileged mode only / privileged and unprivileged
- Execute permission: execution permitted / execute never (XN)
- Shareability: Generally not used with Cortex®-M (useful for multiprocessor systems)
- Memory type:
- Normal: with accesses that have no side-effects, that can be
reordered on the AXI bus. Normal memories are further described by:
- Cacheability: Are the accesses cacheable or not. Cacheability is
further defined by attributes for both inner (for integrated caches)
and outer-caches attributes:
- Cache policy: Write-through or write-back
- Allocation: Allocate/Don’t allocate for R/W accesses
- Transient hint: hints to the cache that the data may only be needed temporarily
- Cacheability: Are the accesses cacheable or not. Cacheability is
further defined by attributes for both inner (for integrated caches)
and outer-caches attributes:
- Device: with accesses that have side-effects. Among others: this
type of memory is never cacheable, and
accesses must be aligned. Device memories are
further described by attributes:
- Gathering/non-Gathering (G/nG): multiple accesses of the same type (R/W) can be merged (or not)
- Reordering/non-Reordering (R/nR): transactions can be reordered or not (in nR regions, transactions must occur in order)
- Early Write Ack or non-Early Write Ack (E/nE): nE attributes recommend that only the very endpoint acknowledges transactions (the transaction is seemingly slower, but the acknowledge is done by the receiver of the transaction).
- Normal: with accesses that have no side-effects, that can be
reordered on the AXI bus. Normal memories are further described by:
The MPU can be fully disabled (no actions from it, and the memory
attributes are set based on the default Armv8-M memory map).
Otherwise, after configuring it and enabling it, MemManage
Faults will be raised if illegal accesses are done and catched
by the MPU.
Configurations done in the project
FSBL
- The MPU is configured:
- Attributes configuration 0 is set to “Normal” memory with Write-Back/Read-Allocate/Non-transient hint policies
- The address range
0x70000000-0x77FFFFFF
:- Uses attributes 0
- Can be used to execute code (the Secure/Non-Secure firmware can be executed from flash)
- Only allows Read-Only accesses
- Allows privileged or non-privileged accesses
- Clock configuration is done in the FSBL
- XSPI2 controller is set up to access external flash memory
- The FSBL then jumps to the Secure code.
Secure firmware
The secure firmware handles all the configurations that must be done on the secure side (except MPU, already done in the FSBL):
- The SAU is configured by
TZ_SAU_Setup
:- Region 0 [NSC] contains all the non-secure callable functions (memory range defined at link time)
- Region 1 [NS]
0x20000000
-0x2FFFFFFF
- Region 2 [NS]
0x40000000
-0x4FFFFFFF
- Region 3 [NS]
0x70180000
-0x7FFFFFFF
- All other regions are not configured, and are then, secure regions.
- Power-management configurations: internal memories are powered-up, sleep mode gating is set up.
- Memory-mapping of the external memories is done
- The NPU is set up
- The NPU cache (CACHEAXI) is set up
- The NPU RIMC configuration is done: the NPU master is configured as “non-secure, privileged” – The NPU generates non-secure accesses on the bus
- The NPU RISC configuration is done: the NPU slave is configured as “non-secure, privileged” – The NPU can be configured in non-secure mode
- RISAF is configured:
- Most of the memories are accessible in both secure and non-secure modes
- External flash memory is configured with
- Addresses from
0x70100000
to0x7017FFFF
as secure (the secure firmware is made to be flashed here) - Addresses from
0x70180000
onwards as non-secure (the non-secure firmware and the network parameters are to be stored there)
- Addresses from
- GPIOs are configured: RISC and pin attributes are set
- The two LEDs RISC-related registers are configured to make the pins configurable in non-secure mode
- The two LEDs pins attributes are set to non-secure.
- Interrupts are set up to allow Invalid Accesses violations
debuggable
- IAC is set up and the IAC IRQ is enabled
- Busfault/Securefault handlers are activated
- Then the secure firmware jumps to the non-secure firmware.
Inference done on the NS side
The Non-Secure software is as usual for using the NPU for doing an inference.
Some secure-interrupts-related-callbacks are registered (by using
the NSC region) from the non-secure side on the secure-side: as a
contract, the secure side offers ways to connect callbacks whenever
security IRQ pops-up.
From the NS side, if an access violation occurs, the IRQ is handled
by the secure firmware, the secure interrupt handler then jumps to
the callback (in the NS firmware).
Memory pool configuration
Special attention has to be taken when crafting the memory pool file for the ST Neural-ART compiler in such configurations.
When doing an inference, the NPU will issue transactions on the AXI bus that will have the attributes configured by the RIMC in front of it. If the NPU RIMC is configured to tag transactions as non-secure, then for example, when fetching data from the NPU, the target address access should be granted to non-secure (according to SAU, RISAF, etc…).
When doing an inference, some epochs may sometimes not be handled directly by the NPU (SW epochs) and is executed by the MCU (through EmbedNets lib). This time, the MCU accesses memories directly. If the inference is done in non-secure mode, the transactions will also be tagged as non-secure data fetches/writes.
To ease the use-case of doing inferences in Non-Secure:
- SAU should be configured to fit the IDAU
NS regions for memories (at least AXISRAMs + NPU RAMS, at
0x2-------
) - RISAF should be configured to allow Non-Secure requests to those memory ranges
- The memory pool file should make reference to only Non-Secure
memory regions configured in the SAU / RISAF (this means, for
example, that any address starting by
0x3-------
) should be banned from a memory pool crafted for non-secure inference)
NPU Validation
This project comes in two flavors: IAR Embedded Workbench® IDE, makefile (based on gcc) project.
Overview
This project is extensively used in other articles related to ST Neural-ART NPU.
The only goal of this project is to be responsive to UART
commands that can be sent by the AiRunner
package.
It is mainly able to
- Report information about the Neural-ART-compiled-network it contains
- Feed data received from the UART to the network
- Run an instrumented inference of the network (providing a detail of the duration taken for each step of each epoch of the inference)
- Send back output data over UART
How to use
To tune the project, it is possible to edit the
Core/Inc/app_config.h
file. This allows to set various
settings for the project:
USE_MCU_DCACHE
allows to use CM55 data cache (useful to speed-up data accesses from the CPU). Should be kept to 1.USE_MCU_ICACHE
allows to use CM55 instruction-cache (useful to speed-up instruction fetching). Should be kept to 1.USE_EXTERNAL_MEMORY_DEVICES
allows to use external RAM & flash chips on the discovery-kit board. Setting this to 0 not initializes external memories & XSPIs connected to it, thus being more power-efficient, but greatly decreasing storage capability.USE_OVERDRIVE
allows to “boost” capabilities of the MCU by increasing power supply voltage and overclocking it.NO_OVD_CLK400
can be used to force all clocks at 400MHz (CPU/NIC/NOC/NPU) when no overdrive is set (this can be useful to debug clocks, or to reproduce the “same” clocking scheme on slower MCUs).USE_UART_BAUDRATE
is set to 921600 bps because a lot of data must be transmitted over UART when doing a full validation. It is possible to decrease this value but this is not recommended.
CM55 Validation
Foreword
This project is a clone of the NPU Validation project, but dedicated for testing the performance of pure software implementation of the deployed model (i.e. not using the NPU). The ST Edge AI Core is used to generate the specialized files running only on the Arm® Cortex® M55 core. Only the default software stack is considered, the ST Neural ART stack is not involved.
Like the NPU Validation project, this project comes in two flavors: IAR Embedded Workbench® IDE and makefile (based on gcc) projects. It includes an associated script that provides a simple way to perform validations on STM32N6, similar to what can be done on other STM32 targets with embedded flash memory. Usage is similar to the previous project. Below is a quick start guide.
Limitations
- Weights are necessarly stored in the external flash. No
configuration is provided allowing to store the weights in the
internal RAMs.
- Only the
legacy
c-api is supported. The--c-api st-ai
option can be not used.
Quickstart
As presented in the NPU getting started, the main flow here is similar. This part will show differences with the NPU-enabled flow to evaluate a model without NPU.
Generate
The generate
command shall not be called
with --st-neural-art
and should be called with the
following options:
$> stedgeai generate -m <model-file> --target stm32n6 --binary --address 0x71000000 --memory-pool ./mypool_N6.json
--binary
is used to generate the weights in a binary file. It works together with the--address
argument which provides the base address requested by the user to store weights. Note that the full flow currently works correctly for the weights stored in external memory (address superior to 0x7000’0000)--memory-pool
specifies the addresses and ranges of memory available to store activations of the C-implementation of the network. An example of such a memory_pool file (Not to be confused with the.mpool
files used by the Neural-Art compiler !!!) is provided in:$STEDGEAI_CORE_DIR/Projects/STM32N6570-DK/Applications/CM55_Validation/mypool_N6.json
Warning
The --memory-pool
option is mandatory to define the
possible memory locations that can be used to store the activations.
Internal RAM (close to the MCU) will be used fisrt, followed by the
NPU RAMs and then the external RAM (see contains of the
mypool_N6.json
file).
This command generates various files (similar to generation for
STM32 chips without NPU) located in the st_ai_output
folder.
Flash weights, compile the validation project
As for the “standard” NPU flow for the N6, deploying the outputs
of the generate command can be done easily through a script:
cm55_loader.py
This script works with a very similar configuration as the
n6_loader
script presented
here:
config.json
, completely similar to the n6_loader one: Use the same fileconfig_cm55.json
, similar to the n6_loader one, the only difference being that the “project” to target should be the CM55 Validation one.
Once those two config files are completed, it is possible to call the script to do all the steps required to perform a valiation on target afterwards:
- Copy useful files to the project
- Flash the weights in external flash
- Compile the project, and run it.
For example:
$ python cm55_loader.py --cm55-loader-config config_cm55l.json
06/20/2025 10:54:41 AM __main__ -- Preparing compiler GCC
06/20/2025 10:54:41 AM __main__ -- Setting a breakpoint in main.c at line 119 (before the infinite loop)
06/20/2025 10:54:41 AM __main__ -- Copying network.c to project: ~/stedgeai/my_output/network.c -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network.h to project: ~/stedgeai/my_output/network.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_c_info.json to project: ~/stedgeai/my_output/network_c_info.json -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_config.h to project: ~/stedgeai/my_output/network_config.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_generate_report.txt to project: ~/stedgeai/my_output/network_generate_report.txt -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_data.c to project: ~/stedgeai/my_output/network_data.c -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_data.h to project: ~/stedgeai/my_output/network_data.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_data_params.c to project: ~/stedgeai/my_output/network_data_params.c -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_data_params.h to project: ~/stedgeai/my_output/network_data_params.h -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Copying network_data.bin to project: ~/stedgeai/my_output/network_data.bin -> ~/stedgeai/Projects/STM32N6570-DK/Applications/CM55_Validation/X-CUBE-AI/App
06/20/2025 10:54:41 AM __main__ -- Extracting weights location from c-files
06/20/2025 10:54:41 AM __main__ -- Weights location: 0x71000000
06/20/2025 10:54:41 AM __main__ -- Resetting the board...
06/20/2025 10:54:41 AM __main__ -- Skipping flashing memory network_data.bin -- 10.320 kB
06/20/2025 10:54:41 AM __main__ -- Building project (conf= N6-DK)
06/20/2025 10:55:08 AM __main__ -- Running the program
06/20/2025 10:55:10 AM __main__ -- Start operation achieved successfully
The validation firmware is now running, it is possible to validate the implementation.
Validate
The commandline to validate is straightforward, do not forget to pass the memory-pool argument (or expect coherency checks errors):
$> stedgeai validate -m st_mnist_v1_28_tfs_int8.tflite --target stm32n6 --mode target --desc serial:921600 --memory-pool ./mypool_N6.json
ST.AI Profiling results v2.0 - "network"
---------------------------------------------------------------
nb sample(s) : 1
duration : 0.986 ms by sample (0.986/0.986/0.000)
macc : 1081896
cycles/MACC : 0.73
CPU cycles : [788,872]
used stack/heap : not monitored/0 bytes
---------------------------------------------------------------
Inference time per node
-----------------------------------------------------------------------------------------
c_id m_id type dur (ms) % cumul CPU cycles name
-----------------------------------------------------------------------------------------
0 0 NL (0x107) 0.003 0.3% 0.3% [ 2,691 ] ai_node_0
1 1 Conv2D (0x103) 0.137 13.9% 14.3% [ 109,922 ] ai_node_1
2 2 Pad (0x116) 0.006 0.6% 14.8% [ 4,403 ] ai_node_2
3 2 Conv2D (0x103) 0.152 15.4% 30.2% [ 121,354 ] ai_node_3
4 3 Pad (0x116) 0.005 0.5% 30.7% [ 3,935 ] ai_node_4
5 3 Conv2D (0x103) 0.432 43.8% 74.5% [ 345,655 ] ai_node_5
6 4 Pad (0x116) 0.007 0.7% 75.3% [ 5,667 ] ai_node_6
7 4 Conv2D (0x103) 0.069 7.0% 82.3% [ 55,228 ] ai_node_7
8 5 Conv2D (0x103) 0.110 11.2% 93.4% [ 88,127 ] ai_node_8
9 6 Pool (0x10b) 0.033 3.4% 96.8% [ 26,608 ] ai_node_9
10 7 Dense (0x104) 0.023 2.3% 99.1% [ 18,087 ] ai_node_10
11 8 Softmax (0x10c) 0.008 0.8% 99.9% [ 6,399 ] ai_node_11
12 9 NL (0x107) 0.001 0.1% 100.0% [ 796 ] ai_node_12
-----------------------------------------------------------------------------------------
total 0.986 [ 788,872 ]
-----------------------------------------------------------------------------------------
Statistic per tensor
--------------------------------------------------------------------------------
tensor # type[shape]:size min max mean std name
--------------------------------------------------------------------------------
I.0 1 u8[1,28,28,1]:784 1 255 126.273 74.791 input_1
O.0 1 f32[1,1,1,36]:144 0.000 0.957 0.028 0.157 output_1
--------------------------------------------------------------------------------
As it can be seen from the Inference time per node
table, the layers are not implemented using NPU facilities, but a
standard software implementation is done for the model.