ST Neural-ART NPU - Runtime loadable model support
for STM32 target, based on ST Edge AI Core Technology 2.2.0
r1.2
Introduction
based on the Python scripts v1.3:
$STEDGEAI_CORE_DIR/scripts/N6_reloc
directory
What is a runtime loadable model?
A runtime loadable model, also known as a relocatable binary model, is a binary object that can be installed and executed at runtime within an STM32 memory subsystem. This model includes a compiled version of the generated neural network (NN) C-files, encompassing the necessary forward kernel functions and weights/parameters. Its primary purpose is to offer a flexible method for upgrading AI-based applications without the need to regenerate and flash the entire end-user firmware. This approach is particularly useful for technologies such as firmware over-the-air (FOTA).
The binary object can be seen as a lightweight plug-in, capable of running from any address (position-independent code) and storing its data anywhere in memory (position-independent data). An efficient and minimal dynamic loader enables the instantiation and usage of this model. Unlike traditional systems, the STM32 firmware does not embed a complex and resource-intensive dynamic linker for Arm Cortex®-M microcontrollers. The generated object is mainly self-contained, requiring limited and well-defined external symbols (NPU RT dependent) at runtime.
In comparison with the software-only STM32 solution (fully self-contained object), the runtime loadable model for the Neural-ART accelerator must be installed inside an NPU runtime software stack. This allows supporting multiple instances, including the static version. No specific synchronization mechanism is embedded in the relocatable part; instead, the scheduling and access to the hardware resources (NPU subsystem) are managed by the stack itself (static part).
The generated runtime loadable model is a container that includes
relocatable code (text/data/rodata sections) to configure various
NPU epochs. It mainly comprises a compiled version (MCU-dependent)
of the specialized network.c
file, the LL ATON driver
functions used, and the code for purely delegated software
operations, which are part of the optimized network runtime library.
For hybrid epochs (call of LL_ATON_LIB_xxx
functions),
a specific callback-based mechanism is in place to directly invoke
the services of the stack. These system call-backs functions are
provided by the static part of the NPU stack and are registered
during the install phase of the model (see ll_aton_reloc_install()
function).
Solution overview - Memory model
As illustrated by the following figure, to use runtime loadable models, a part of the internal RAMs (RW AI regions) are fixed/reserved and have the absolute addresses. These regions should not be used by the application during inference. A non-fixed executable RAM region is required to install a given model, this RW region is used to resolve the relocatable references during the installation process at runtime.
Item | Definition |
---|---|
executable RAM region | Designates the memory region used to install a model at runtime (txt/data sections). This MCU/NPU memory-mapped region, located anywhere in the memory subsystem, can be reserved at runtime by the application. The minimum requested size is returned by the ll_aton_reloc_get_info() function. Attributes: MCU RWX, NPU RO. For performance reason, the region should be fully cached (MCU). (*) |
RW AI regions | Designates the memory regions which are reserved for the activations/parameters. Base addresses are absolutes and are defined in the mem-pool descriptor files. Attributes: MCU/NPU memory-mapped, MCU RW, NPU RW. To know/check the memory regions which are used, the ll_aton_reloc_get_mem_pool_desc() can be used. (**) |
RW AI region (external RAM) | If requested, designates the memory regions which are allocated/reserved to place the requested part of the memory-pool defined for the external RAM. The base address can be relative (can be placed anywhere in the external RAM) or absolute. The referenced addresses are resolved during the install process at runtime. It can be reserved at runtime by the application. The requested size is returned by the ll_aton_reloc_get_info() function. Attributes: MCU RWX, NPU RW (**). |
Model X (weights/params) | Designates the memory regions where the relocatable binary models are stored. Base addresses (file_ptr) are relative and dependent on the application managing the FOTA-like mechanism. Attributes: MCU/NPU memory-mapped, MCU RO, NPU RO. |
App | Designates the memory regions which are reserved for the application (static part), it embeds a minimal part (NN load service) allowing to load/install and execute a relocatable model. |
(*) This region must be also accessible by the
NPU in the case where the epoch controller is enabled.
(**) These regions should be also accessible by the
MCU in the case where a operator/epoch
is delegated on the MCU.
Limitations
- Only the STM32N6-based series with Neural-ART accelerator is supported.
- Addresses of the used internal/on-chip memory-pools
are fixed/absolutes (
'USEMODE_ABSOLUTE'
attribute). Only the external addresses relative to the off-chip memories: flash and RAM type, are relocatable/relative (or'USEMODE_RELATIVE'
attribute)
- Only two relocatable/relative memory pools are supported. One for RO regions handling the weights/params and another for an external RW memory region. Note that the external RAM region can be also absolute.
Setting up a work environment
The generation of a runtime loadable model is not integrated into
the ST Edge AI Core CLI.
Instead, a set of specific Python scripts is provided in the
$STEDGEAI_CORE_DIR/scripts/N6_reloc
directory. The
npu_driver.py
script serves as the entry point for
generating the runtime loadable model.
%STEDGEAI_CORE_DIR%
represents the root location where the ST Edge AI Core components are installed, typically in a path like"<tools_dir>/STEdgeAI/<version>/"
.
Requirements
To use these scripts, you need Python 3.9+ and the following Python modules:
pyelftools==0.27
tabulate
colorama
Additionally, ensure that the Make utility and GNU Arm Embedded tool-chain supporting the Arm® Cortex® M55 (with the ‘arm-none-eabi-’ prefix) are available in your PATH.
Note
The Python interpreter available in the ST Edge AI Core pack can be directly used (all requested Python module are already installed).
Optional Tools
For validating a runtime loadable model on the STM32N6570-DK board, you may optionally install:
- STM32CubeIDE
supporting the STM32n6 series (version 1.17.0 or higher)
- stm_ai_runner Python package
The
STM32_CUBE_IDE_DIR
system environment variable can be set to indicate the installation folder.
Getting started
Generating a RT loadable model
The generation of a runtime loadable model involves two main steps. The first step allows the generation of the specialized c-file and associated memory initializers. The second step generates the binary object (or relocatable module) representing the runtime loadable model.
Generating the specialized c-files
The first step involves generating the specialized files and
associated memory initializers. There are no restrictions on the
options used for this step, except the usage of the
--all-buffers-info
option to have all detailed
information about the intermediate/activation buffers. Note that
these generated extra information are removed during the generation
of the runtime loadable model. For the memory-pool descriptor file, the
following attributes are mandatory:
"mode": "USEMODE_RELATIVE“
for external memory pools only (flash and ram). Otherwise, use'USEMODE_ABSOLUTE'
. Note that since v1.3, the external RAM region can be also absolute."fformat": "FORMAT_RAW“
for all memory initializers.
Generate the specialized c-files:
$ stedgeai generate -m <quantize_model> --target stm32n6 --st-neural-art <profile>@<conf_file>
By default the generated files are stored in the
.\st_ai_output
folder:network.c network_atonbuf.xSPI2.raw network_c_info.json
Note that the memory initializers for the memory regions which are only used for the activations are not requested.
Generating the relocatable binary model
The second step involves postprocessing and compiling the output from the first step to generate a binary object (or relocatable module) representing the runtime loadable model. This step ensures that the final code/data size is accurate and ready for deployment.
The npu_driver.py
is the entry point for executing
the steps required to generate the loadable model. The
'--input/-i'
is used to specify the location of the
generated "network.c"
and associated memory
initializers (*.raw files
). The default value is
./st_ai_output
. The '--output\-o'
option
is used to specify the output folder (default:
./build
).
[Details] npu_driver.py script
usage: npu_driver.py [-h] [--input STR] [--output STR] [--name STR] [--no-secure] [--no-dbg-info]
[--ecblob-in-params] [--split] [--llvm] [--st-clang] [--compatible-mode]
[--custom [STR]] [--cross-compile STR]
[--gen-c-file] [--parse-only] [--no-clean]
[--log [STR]] [--verbosity [{0,1,2}]] [--debug] [--no-color]
NPU Utility - Relocatable model generator v1.3
optional arguments:
-h, --help show this help message and exit
--input STR, -i STR location of the generated c-files (or network.c file path)
--output STR, -o STR output directory
--name STR, -n STR basename of the generated c-files (default=<network-file-name>)
--no-secure generate binary model for non secure context
--no-dbg-info generate binary model without LL_ATON_EB_DBG_INFO
--ecblob-in-params place the EC blob in param section
--split generate a separate binary file for the params/weights
--llvm use LLVM compiler and libraries (default: GCC compiler is used)
--st-clang use ST CLANG compiler and libraries (default: GCC compiler is used)
--compatible-mode set the compatible option (target dependent)
--custom [STR] config file for custom build (default: custom.json)
--cross-compile STR prefix of the ARM tool-chain (CROSS_COMPILE env variable can be used)
--gen-c-file generate c-file image (DEBUG PURPOSE)
--parse-only parsing only the generated c-files
--no-clean Don't clean the intermediate files
--log [STR] log file
--verbosity [{0,1,2}], -v [{0,1,2}]
set verbosity level
--debug Enable internal log (DEBUG PURPOSE)
--no-color Disable log color support
$ python $STEDGEAI_CORE_DIR/scripts/N6_reloc/npu_driver.py -i st_ai_output/network.c -o build
...
XIP size = 29,008 (0x7150) data+got+bss sections
COPY size = 173,512 (0x2a5c8) +ro sections
PARAMS offset = 176,064 (0x2afc0)
PARAMS size = 1,793,272 (0x1b5cf8)
┌────────────────────────┬────────────────────────────────┬────────┬──────────┬─────────┐
│ name (addr) │ flags │ foff │ dst │ size │
├────────────────────────┼────────────────────────────────┼────────┼──────────┼─────────┤
│ xSPI2 (20009f84) │ 01010500 RELOC.PARAM.0.RCACHED │ 0 │ 00000000 │ 1793265 │
│ AXISRAM5 (20009f8a) │ 03020200 RESET.ACTIV.WRITE │ 0 │ 342e0000 │ 352800 │
│ <undefined> (00000000) │ 00000000 UNUSED │ 0 │ 00000000 │ 0 │
└────────────────────────┴────────────────────────────────┴────────┴──────────┴─────────┘
Table: mempool c-descriptors (off=40005800, 3 entries, from RAM)
rt_ctx: c_name="network", acts_sz=352,800, params_sz=1,793,265, ext_ram_sz=0
rt_ctx: rt_version_desc="atonn-v1.1.1-13-g0b0b607eb (RELOC.GCC)"
Generating files...
creating "build\network_rel.bin" (size=1,969,336)
Final information returned by the npu_driver.py
:
item | description |
---|---|
XIP size | Indicates the size (in bytes) of the requested executing RAM memory region (XIP mode) |
COPY size | Indicates the size (in bytes) of the requested executing RAM memory region (COPY mode) |
PARAMS size | Indicates the size (in bytes) of the weights/params sections |
rt_ctx | Indicates the extracted informations from the binary header |
The “mempool c-descriptors” table indicates the memory regions (part of the user memory-pools) and the flags which are considered by the ll_aton_reloc_install() function at runtime.
flag | description |
---|---|
RELOC | Indicates a relocatable region, the address (dst=0) will be resolved at runtime during the installation process. |
PARAM/ACTIV/MIXED | Indicates the type of contents: PARAM: params/weights only, ACTIV: activations only, MIXED: mixed |
RCACHED/WCACHED | Indicates that a part of the memory region can be accessible through the NPU cache. RCACHED is associated with a RELOC.PARAM/read-only region |
WRITE | Indicates that the memory region is a read-write memory region. |
RESET | Indicates that the memory region can be cleared if the AI_RELOC_RT_LOAD_MODE_CLEAR option is used. |
COPY | Indicates that the region is initialized/copied during the installation process. |
UNUSED | Last entry in the mempool c-descriptors |
- The number
'0'
or'1'
indicates the ID of the relocatable memory regions. Currently only two regions are supported:'0'
for a params/weights only region in external flash and'1'
for a read/write memory region in the external RAM.
'foff'
indicates the offset in the'params/weights'
section to find the associated memory initializer when requested.'dst'
indicates the destination address. If not equal to zero, the address is an absolute address otherwise the region is a relocatable region.size
indicates the size in byte.
Example with a tiny model using only the internal NPU RAM for the activations and weights/params.
At runtime during the installation process, the AXIRAM2,3 and 4 (absolute address) is initialized with the contents of the params/weights section. AXIRAM5 is only used for the activations.
XIP size = 14,192 (0x3770) data+got+bss sections
COPY size = 173,576 (0x2a608) +ro sections
PARAMS offset = 174,888 (0x2ab28)
PARAMS size = 1,966,080 (0x1e0000)
┌────────────────────────┬────────────────────────────┬────────┬──────────┬─────────┐
│ name (addr) │ flags │ foff │ dst │ size │
├────────────────────────┼────────────────────────────┼────────┼──────────┼─────────┤
│ AXISRAM5 (20009f84) │ 03020200 RESET.ACTIV.WRITE │ 0 │ 342e0000 │ 180000 │
│ AXISRAM4 (20009f8d) │ 02030200 COPY.MIXED.WRITE │ 0 │ 34270000 │ 458752 │
│ AXISRAM3 (20009f96) │ 02030200 COPY.MIXED.WRITE │ 458752 │ 34200000 │ 458752 │
│ AXISRAM2 (20009f9f) │ 02010100 COPY.PARAM.READ │ 917504 │ 34100000 │ 1048572 │
│ <undefined> (00000000) │ 00000000 UNUSED │ 0 │ 00000000 │ 0 │
└────────────────────────┴────────────────────────────┴────────┴──────────┴─────────┘
Table: mempool c-descriptors (off=40001df8, 5 entries, from RAM)
rt_ctx: c_name="network", acts_sz=352,815, params_sz=1,793,261, ext_ram_sz=0
rt_ctx: rt_version_desc="atonn-v1.1.1-13-g0b0b607eb (RELOC.GCC)"
Generating files...
creating "build\network_rel.bin" (size=2,140,968)
Example with a model using only the external RAM/FLASH (internal RAMs are not used)
At runtime during the installation process, the references relative to the xSPI1/xSPI2 (relative address) are respectively resolved to the external RAM address (reserved by the application) and the params/weights section (part of the installed relocatable module).
XIP size = 70,984 (0x11548) data+got+bss sections
COPY size = 174,816 (0x2aae0) +ro sections
PARAMS offset = 179,784 (0x2be48)
PARAMS size = 1,793,272 (0x1b5cf8)
┌────────────────────────┬────────────────────────────────┬────────┬──────────┬─────────┐
│ name (addr) │ flags │ foff │ dst │ size │
├────────────────────────┼────────────────────────────────┼────────┼──────────┼─────────┤
│ xSPI1 (2000a4a0) │ 01020601 RELOC.ACTIV.1.WCACHED │ 0 │ 00000000 │ 352800 │
│ xSPI2 (2000a4a6) │ 01010500 RELOC.PARAM.0.RCACHED │ 0 │ 00000000 │ 1793265 │
│ <undefined> (00000000) │ 00000000 UNUSED │ 0 │ 00000000 │ 0 │
└────────────────────────┴────────────────────────────────┴────────┴──────────┴─────────┘
Table: mempool c-descriptors (off=4000fbf8, 3 entries, from RAM)
rt_ctx: c_name="network", acts_sz=352,800, params_sz=1,793,265, ext_ram_sz=352,800
rt_ctx: rt_version_desc="atonn-v1.1.1-13-g0b0b607eb (RELOC.GCC)"
Epoch controller consideration
If the epoch controller is enabled, the
generated command streams (also called ecblobs) are stored as const
in the rodata section by default. The npu_driver.py
script reports an extra table to know the size of the blob
(_ec_blob_X
). To be able to update them at runtime with
the references of the relocatable memory-pools, extra bss size is
considered (reloc=r:binary
). Consequently, to
install a model, the size of the requested executable RAM (XIP or
COPY mode) is significantly more important. If a specific blob
references only the absolute addresses,
With epoch controller support:
XIP size = 247,448 (0x3c698) data+got+bss sections
COPY size = 504,192 (0x7b180) +ro sections
PARAMS offset = 260,048 (0x3f7d0)
PARAMS size = 1,758,216 (0x1ad408)
...
┌──────────────────────┬─────────┬───────────┬──────────┐
│ Name │ bss │ ro data │ reloc │
├──────────────────────┼─────────┼───────────┼──────────┤
│ _ec_blob_Default_1 │ 123,896 │ 124,248 │ r:binary │
│ _ec_blob_Default_63 │ 22,176 │ 22,272 │ r:binary │
│ _ec_blob_Default_74 │ 21,064 │ 21,168 │ r:binary │
│ _ec_blob_Default_85 │ 70,912 │ 71,112 │ r:binary │
│ _ec_blob_Default_110 │ 6,576 │ 6,648 │ r:binary │
│ _ec_blob_Default_116 │ │ 368 │ │
│ _ec_blob_Default_118 │ │ 832 │ │
│ │ │ │ │
│ total │ 244,624 │ 246,648 │ │
└──────────────────────┴─────────┴───────────┴──────────┘
Table: EC blob objects (7)
Same model, with epoch controller support and –ecblob-in-params option.
The 'COPY size'
is decreased because the const data
does not include the ecblobs. The non-patched ecblobs are fetched
from the flash by the epoch controller.
XIP size = 247,464 (0x3c6a8) data+got+bss sections
COPY size = 257,504 (0x3ede0) +ro sections
PARAMS offset = 13,384 (0x3448)
PARAMS size = 2,004,864 (0x1e9780)
Same model, without epoch controller support
XIP size = 29,008 (0x7150) data+got+bss sections
COPY size = 173,512 (0x2a5c8) +ro sections
PARAMS offset = 176,064 (0x2afc0)
PARAMS size = 1,793,272 (0x1b5cf8)
Weights/params encryption consideration
If the model is generated with the
'--encrypt-weights'
option to support the encrypted
weights/params, before to generate the relocatable binary model, the
weights/params (network_atonbuf.xSPI2.raw
file) should
be encrypted as for the
nonrelocatable model. It is recommended to use the '--split'
to be able to
fix the address of the weights/params in the FLASH, because the
encryption is dependent of the location.
To use a model with the weights/params encrypted, the index/keys for the different bus interfaces should be set before to execute the model.
...
(&nn_instance);
LL_ATON_RT_Reset_Network
// Set bus interface keys -- used for encrypted inference only
( 0 , 0 , BUSIF_LSB_KEY , BUSIF_MSB_KEY );
LL_Busif_SetKeys ( 0 , 1 , BUSIF_LSB_KEY , BUSIF_MSB_KEY );
LL_Busif_SetKeys ( 1 , 0 , BUSIF_LSB_KEY , BUSIF_MSB_KEY );
LL_Busif_SetKeys ( 1 , 1 , BUSIF_LSB_KEY , BUSIF_MSB_KEY );
LL_Busif_SetKeys
do {
/* Execute first/next step of Cube.AI/ATON runtime */
= LL_ATON_RT_RunEpochBlock(&nn_instance);
ll_aton_rt_ret ...
Extra options
The following additional options can be specified:
--no-secure
: Indicates that the generated runtime loadable model will be executed in a nonsecure context. The setting of the NPU processing units uses the nonsecure aliased address (NPU_BASE_NS
). Otherwise, the secure aliased address is used(NPU_BASE_S
).
--split
: Specifies that the parameters/weights sections should be generated in a separate binary file.--pack-dir
: If theSTEDGEAI_CORE_DIR
system environment variable is not defined, this option is used to define the root location of the ST Edge AI Core installation folder.
“–split” option
This option allows generating two separate files. In this case,
to deploy the model, both files should be deployed on the target,
and the address of the "network_rel_params.bin"
should
be passed during the installation process at runtime (see ll_aton_reloc_install()
function).
$ python $STEDGEAI_CORE_DIR/scripts/N6_reloc/npu_driver.py -i st_ai_output/network.c -o build --split
...
XIP size = 28,472 (0x6f38) data+got+bss sections
COPY size = 169,488 (0x29610) +ro sections
PARAMS offset = 0 (0x0)
PARAMS size = 0 (0x0)
┌────────────────────────┬────────────────────────────────┬────────┬──────────┬─────────┐
│ name (addr) │ flags │ foff │ dst │ size │
├────────────────────────┼────────────────────────────────┼────────┼──────────┼─────────┤
│ xSPI2 (2000a3bf) │ 01010500 RELOC.PARAM.0.RCACHED │ 0 │ 00000000 │ 1793265 │
│ AXISRAM5 (2000a3c5) │ 03020200 RESET.ACTIV.WRITE │ 0 │ 342e0000 │ 352800 │
│ <undefined> (00000000) │ 00000000 UNUSED │ 0 │ 00000000 │ 0 │
└────────────────────────┴────────────────────────────────┴────────┴──────────┴─────────┘
Table: mempool c-descriptors (off=400055b4, 3 entries, from RAM)
Generating files...
creating "build\network_rel.bin" (size=172,032)
creating "build\network_rel_params.bin" (size=1,793,272)
“–ecblob-in-params” option
The --ecblob-in-params
option indicates that the
ecblobs (const part) are placed in the params/weights memory
section. This feature allows to reduce the requested exec memory
region to execute the model in COPY mode. The ecblobs are fetched
from the external flash if not copied in exec ram for update.
Ecblobs are placed in the beginning of the params/weights memory
section. The --split
option is always supported, the
params file includes the ecblobs.
“–name/-n” option
The --name/-n
option allows to specify/overwrite the
expected c-name/file-name of the loadable runtime model. By default,
the name of the generated network files is used.
Default behavior.
$ python npu_driver.py -i <gen-dir>/network.c
...
Generating files...
creating "build\network_rel.bin" (size=..)
$ python npu_driver.py -i <gen-dir>/my_model.c
...
Generating files...
creating "build\network_rel.bin" (size=..)
$ python npu_driver.py -i <gen-dir>/network.c -n toto
...
Generating files...
creating "build\toto_rel.bin" (size=..)
Format of the relocatable binary model
The following figure illustrates the layout of the generated
relocatable binary model ("network_rel.bin"
file). By
default, the memory initializers (params/weights sections) are
included in the image. If the --split
option is used,
the params/weights sections are generated in a separated binary file
("network_rel_params.bin"
file).
- If the epoch controller is enabled, the generated bitstreams are included in the rodata/data sections by defaut. The –ecblob-in-params option can be used to store the ecblob with the params section.
- The description of the params/weights sections is defined in the rodata section.
- When the
'--split'
option is used, the address ('ext_params_add'
) is passed and defined by the application when the ll_aton_reloc_install() function is called.
Evaluating the RT loadable model (STM32N6570-DK board)
A ready-to-use environment for STM32N6570-DK
board (DEV mode) is delivered in the ST Edge AI Core pack. It allows
performing a classical validation, 'validate'
command or through the stm_ai_runner Python package
module. Note that a STM32CubeIDE
environment must be available.
Warning
Set the boot mode in development mode (BOOT1 switch position is 1-3, BOOT0 switch position does not matter). After the loading phase, the board must be NOT switched off or disconnected to be able to perform the validation.
After the generation of the relocatable binary model, the
st_load_and_run.py
Python script is used to flash the
binary files at the fixed addresses, to load, and to run the
aiValidation firmware. After these steps, a single inference is
executed reporting the performance.
[Details] st_load_and_run.py script
-h] [--input STR] [--board STR] [--address STR] [--mode STR] [--cube-ide-dir STR]
usage: st_load_and_run.py [--log [STR]] [--verbosity [{0,1,2}]] [--debug] [--no-color]
[
- ST Load and run (dev environment) v1.2
NPU Utility
optional arguments:-h, --help show this help message and exit
--input STR, -i STR location of the binary files (default: build/network_rel.bin)
--board STR ST development board (default: stm32n6570-dk)
--address STR destination address - model(,params) (default: 0x71000000,0x71800000)
--mode STR fw variants: copy,xip[no-flash,no-run,usbc,ext,max-speed]
--cube-ide-dir STR installation directory of STM32CubeIDE tools (ex. ~/ST/STM32CubeIDE_1.18.0/STM32CubeIDE)
--log [STR] log file
--verbosity [{0,1,2}], -v [{0,1,2}]
set verbosity level
--debug enable internal log (DEBUG PURPOSE)
--no-color disable log color support
$ python $STEDGEAI_CORE_DIR/scripts/N6_reloc/st_load_and_run.py
NPU Utility - ST Load and run (dev environment) (version 1.2)
Creating date : Mon Apr 14 23:14:20 2025
Entry point : 'build\network_rel.bin'
Board : 'stm32n6570-dk'
mode : ['copy']
split model : 'build\network_rel_params.bin'
clang mode : False
exec size : XIP=247,464, COPY=257,504
board size : int=655,360, ext=4,194,304
install mode : 'copy'
Resetting the board.
Flashing 'build\network_rel.bin' at address 0x71000000 (size=2136936)..
Loading & start the validation application 'stm32n6570-dk-validation-reloc-copy'..
Deployed model is started and ready to be used.
Executing the deployed model (desc=serial:921600)..
...
125 - epoch (HYBRID) 0.071 0.6% 99.5% [ 81 41 56,540 ] EpochBlock_117
126 - epoch 0.025 0.2% 99.7% [ 2,880 15,560 1,323 ] EpochBlock_118
127 - epoch 0.032 0.3% 100.0% [ 7,456 17,432 747 ] EpochBlock_119
-------------------------------------------------------------------------------------------------------------
total 11.808 [ 1,948,894 7,004,597 493,125 ]
84.69 inf/s [ 20.6% 74.1% 5.2% ]
-------------------------------------------------------------------------------------------------------------
Evaluate the performances
$ stedgeai validate -m <quantize_model> --target stm32n6 --mode target -d serial:921600
...
Evaluation report (summary)
-------------------------------------------------------------------------------------------------------------
Output acc rmse ... std nse cos tensor
-------------------------------------------------------------------------------------------------------------
X-cross #1 n.a. 0.007084151 ... 0.007081 0.999671 0.999910 10 x uint8(1x3087x6)
-------------------------------------------------------------------------------------------------------------
Deploy and use a relocatable model
There is no specific service allowing to deploy the model on a
target, FOTA-like mechanism, and other stack to manage the firmwares
or models are application-specific. However, when the relocatable
model is flashed on the target at a given memory-mapped address
(file_ptr
), the ll_aton_reloc_install()
function must be called to install the model.
LL_ATON stack configuration
LL_ATON_XX C-defines | comment |
---|---|
LL_ATON_PLATFORM | 'LL_ATON_PLAT_STM32N6'
mandatory |
LL_ATON_EB_DBG_INFO | mandatory |
LL_ATON_RT_RELOC | mandatory - Enables the code paths and functionalities required to manage the relocatable mode. |
LL_ATON_RT_MODE | LL_ATON_RT_ASYNC is
recommended but LL_ATON_RT_POLLING can be used. |
LL_ATON_OSAL | no restriction |
Minimal code
The following snippet code illustrates how to install and to use
a runtime loadable model within a bare metal environment, single
network with a single input tensor, and a single output tensor
(epoch controller is used or not). user_model_mgr()
function is used to retrieve the address where the module has been
flashed.
#include "ll_aton_reloc_network.h"
static NN_Instance_TypeDef nn_instance; /* LL ATON handle */
uint8_t *input_0, *prediction_0;
uint32_t input_size_0, prediction_size_0;
int ai_init(const uintptr_t file_ptr, const uintptr_t file_params_ptr)
{
const LL_Buffer_InfoTypeDef *ll_buffer;
;
ll_aton_reloc_info rtint res;
/* Retrieve the requested RAM size to install the model */
= ll_aton_reloc_get_info(file_ptr, &rt);
res /* Reserve executable memory region to install the model */
uintptr_t exec_ram_addr = reserve_exec_memory_region(rt.rt_ram_copy);
/* Reserve external read/write memory region for external RAM region */
uintptr_t ext_ram_addr = NULL;
if (rt.ext_ram_sz > 0)
= reserve_ext_memory_region(rt.ext_ram_sz);
ext_ram_addr
/* Create and install an instance of the relocatable model */
;
ll_aton_reloc_config config.exec_ram_addr = exec_ram_addr;
config.exec_ram_size = rt.rt_ram_copy;
config.ext_ram_addr = ext_ram_addr;
config.ext_ram_size = rt.ext_ram_sz;
config.ext_param_addr = NULL; /* or @ of the weights/params if split mode is used */
config.mode = AI_RELOC_RT_LOAD_MODE_COPY; // | AI_RELOC_RT_LOAD_MODE_CLEAR;
config
= ll_aton_reloc_install(file_ptr, &config, &nn_instance);
res
if (res != 0)
{
/* Retrieve the addresses of the input/output buffers */
= ll_aton_reloc_get_input_buffers_info(&nn_instance, 0);
ll_buffer = LL_Buffer_addr_start(ll_buffer);
input_0 = LL_Buffer_len(ll_buffer);
input_size_0 = ll_aton_reloc_get_output_buffers_info(&nn_instance, 0);
ll_buffer = LL_Buffer_addr_start(ll_buffer);
prediction_0 = LL_Buffer_len(ll_buffer);
prediction_size_0
/* Init the LL ATON stack and the instantiated model */
();
LL_ATON_RT_RuntimeInit(&nn_instance);
LL_ATON_RT_Init_Network}
return res;
}
void ai_deinit(void)
{
(&NN_Instance_Default);
LL_ATON_RT_DeInit_Network();
LL_ATON_RT_RuntimeDeInit}
void ai_run(void) {
;
LL_ATON_RT_RetValues_t ll_aton_ret(&nn_instance);
LL_ATON_RT_Reset_Networkdo {
= LL_ATON_RT_RunEpochBlock(&nn_instance);
ll_aton_ret if (ll_aton_ret == LL_ATON_RT_WFE) {
();
LL_ATON_OSAL_WFE}
} while (ll_aton_ret != LL_ATON_RT_DONE);
}
void main(void)
{
uintptr_t file_ptr, file_params_ptr;
(); /* HAL, clocks, NPU sub-system... */
user_system_init
(&file_ptr, &file_params_ptr);
user_model_mgrif (ai_init(file_ptr, file_params_ptr)) {
/*... installation issue ..*/
}
while (user_app_not_finished()) {
/* Fill input buffers */
(input_0);
user_fill_inputs/* If requested, perform the NPU/MCU cache operations to guarantee the coherency of the memory. */
// LL_ATON_Cache_MCU_Clean_Invalidate_Range(input_0, input_size_0);
// LL_ATON_Cache_MCU_Invalidate_Range(prediction_0, prediction_size_0);
/* Perform a complete inference */
();
ai_run/* Post-process the predictions */
(prediction_0);
user_post_process}
();
ai_deinit//...
}
XIP/COPY execution modes
XIP execution mode
This mode allows the code to be executed directly from the memory where it is stored, without copying it to another location. This mode is efficient in terms of memory usage, only the executable RAM region to store the data/bss/got sections is requested.
COPY execution mode
This mode involves copying the code to a different memory location before execution. This can be useful for optimizing performance or managing memory access speeds. The requested size for the executable RAM region is more important, with the data/bss/got sections, it also includes the txt/rodata sections.
Example of NPU compiler configuration files
Memory-pool descriptor files
The "%STEDGEAI_CORE_DIR%/scripts/N6_reloc/test"
contains a set of configuration and memory-pool
descriptor files which can be used. Here are two main points to
consider:
- If nonsecure context is used to execute the deployed model, the
base @ of the different memory-pools should be set with a nonsecure
address (ex.
0x24350000
instead of0x34350000
for the AXIRAM6 memory). - For the memory-pools representing an off-chip device, the
"USEMODE_RELATIVE"
attribute should be used.
For test purpose with STM32N6570-DK board, different ready-to-use configurations are provided.
memory-pool desc. file | description |
---|---|
stm32n6_reloc.mpool | Full memories. All NPU rams (AXIRAM3..6), AXIRAM2 and external RAM/flash are defined. NPU cache can be used for the external memories. |
stm32n6_int_reloc.mpool | NPU memories only. Only the NPU rams (AXIRAM3..6) are defined. |
stm32n6_int2_reloc.mpool | Internal memories only. Only the NPU RAMs (AXIRAM3..6 and AXIRAM2) are defined. |
stm32n6_ext.mpool | External memories only. Only the external RAM/flash are defined. NPU cache can be used for the external memories. |
For information and test purpose, the *non” reloc memory-pool descriptor files are provided which can be used with a normal deployment flow.
Part of the
./test/mpool/stm32n6_reloc.mpool
file:
...
{
"fname": "AXISRAM6",
"name": "npuRAM6",
"fformat": "FORMAT_RAW",
"prop": { "rights": "ACC_WRITE", "throughput": "HIGH", "latency": "LOW",
"byteWidth": 8, "freqRatio": 1.25, "read_power": 18.531, "write_power": 16.201 },
"offset": { "value": "0x34350000", "magnitude": "BYTES" },
"size": { "value": "448", "magnitude": "KBYTES" }
},
{
"fname": "xSPI1",
"name": "hyperRAM",
"fformat": "FORMAT_RAW",
"prop": { "rights": "ACC_WRITE", "throughput": "MID", "latency": "HIGH",
"byteWidth": 2, "freqRatio": 5.00, "cacheable": "CACHEABLE_ON",
"read_power": 380, "write_power": 340.0, "constants_preferred": "true" },
"offset": { "value": "0x90000000", "magnitude": "BYTES" },
"size": { "value": "32", "magnitude": "MBYTES" },
"mode": "USEMODE_RELATIVE"
},
{
"fname": "xSPI2",
"name": "octoFlash",
"fformat": "FORMAT_RAW",
"prop": { "rights": "ACC_READ", "throughput": "MID", "latency": "HIGH",
"byteWidth": 1, "freqRatio": 6.00, "cacheable": "CACHEABLE_ON",
"read_power": 110, "write_power": 400.0, "constants_preferred": "true" },
"offset": { "value": "0x70400000", "magnitude": "BYTES" },
"size": { "value": "64", "magnitude": "MBYTES" },
"mode": "USEMODE_RELATIVE"
}
...
“neural_art.json” file
The "%STEDGEAI_CORE_DIR%/scripts/N6_reloc/test"
contains two examples of configuration file using the requested
memory-pool descriptor files. They provide different profiles using
different memory configurations which are aligned with the generic
AI test validation.
profile | description |
---|---|
test | Default profile. Full memories configuration and the epoch controller are not enabled |
test-ec | Default profile + support of the epoch controller |
test-int | Internal profile. NPU memories only configuration and the epoch controller is not enabled |
test-int-ec | Internal profile + support of the epoch controller |
test-ext | External profile. NPU memories only configuration and the epoch controller is not enabled |
test-ext-ec | External profile + support of the epoch controller |
Part of the ./test/neural_art_reloc.json
file:
...
"test" : {
"memory_pool": "./mpools/stm32n6_reloc.mpool",
"options": "--native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os
--optimization 3 --Oauto-sched --all-buffers-info --csv-file network.csv"
},
"test-ec" : {
"memory_pool": "./mpools/stm32n6_reloc.mpool",
"options": "--native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os
--optimization 3 --Oauto-sched --all-buffers-info --csv-file network.csv --enable-epoch-controller"
},
...
Performance impacts
Accuracy
No difference with the nonrelocatable or static implementation.
Set up and inference time
For the inference time (after the install/init steps), no significant difference is expected versus a nonrelocatable or static implementation. Only the set-up time is impacted to install/create an instance of a given model. The installation time is directly proportional to the number of relocations, and the size of the code/data sections according to the used COPY/XIP mode. Note that if the memory initializers need to be copied into the internal RAMs, this extra time is equivalent to the static implementation.
- STM32N6570-DK
board (DEV mode), overdrive clock setting (NPU 1 GHz, MCU 800
MHz).
- By default, the external flash is used to store the
weights/params.
- Internal RAMs (AXIRAM3,..6) are used to store the
activations.
- The executable RAM region is located in the internal AXIRAM1 (this is equivalent to the static case where the application is also executed from the AXIRAM1).
yolov5 224 (nano) | static mode (absolute @ only) | reloc mode (copy) | reloc mode (xip) (*) |
---|---|---|---|
inference time (w/ ec) | 10.4 ms, 95.45 inf/s | 10.5 ms, 94.9 inf/s | 10.6 ms, 94.6 inf/s |
inference time (w/o ec) | 13.2 ms, 75.35 inf/s | 13.0 ms, 77.1 inf/s | 14.3 ms, 69.6 inf/s |
install/init time (ms) (w/ ec) | 0.0 / 0.03 | 11.7 / 0.73 | 10.9 / 0.93 |
install/init time (ms) (w/o ec) | 0.0 / 0.03 | 11.2 / 0.03 | 10.8 / 0.03 |
(*) with the epoch controller, as the blob should be updated, it is fetched from the executable RAM region (AXIRAM1). We only observe an impact in the case where the configuration code is fetched from the external FLASH, w/o epoch controller support.
For the reloc mode with epoch controller support, the
'install/init'
time is mainly due to the copy of the
blobs in the AXIRAM1 and the requested relocations to resolve the
weights/params addresses in the different blobs.
Case where only the AXIRAMx is used for the activations and the weights/params (~2 Mbytes)
yolov5 224 (nano) | static mode (absolute @ only) | reloc mode (copy) | reloc mode (xip) |
---|---|---|---|
inference time (w/ ec) | 9.5 ms, 105.3 inf/s | 9.5 ms, 105.8 inf/s | 9.6 ms, 104.6 inf/s |
install/init time (ms) (w/ ec) | 17+ / 0.03 | 18.1 / 0.03 | 18.1 / 0.03 |
The 'install/init'
time is similar in both case. It
is mainly represented by the copy of the memory initializers from
the FLASH location to the internal RAMs. No extra relocation for the
weights/activations is requested (all weighs/activations addresses
are absolutes).
Memory layout overhead
In comparison with a static implementation, the relocation mode involves two additional sections, GOT/REL, which are used to support the position-independent code/data. The size is directly proportional to the number of relocating references.
LL ATON runtime API extension
To enable the support of a runtime loadable model, the LL_ATON files should be compiled with the following C-define:
LL_ATON_RT_RELOC
The LL_ATON_RT_RELOC
C-define activates the code
paths and functionalities required to manage and install runtime
loadable models. Ensuring that this macro is defined during
compilation is crucial for the successful deployment and execution
of runtime loadable models.
ll_aton_reloc_install()
int ll_aton_reloc_install(const uintptr_t file_ptr, const ll_aton_reloc_config *config,
*nn_instance); NN_Instance_TypeDef
Description
The ll_aton_reloc_install()
function acts as a
runtime dynamic loader. It is used to install and to create an
instance of a memory-mapped runtime loadable module. By providing
the model image pointer (file_ptr
), configuration
details, and neural network instance, users can set up the model for
execution. The function performs compatibility checks, initializes
memory pools, and installs/relocates code and data sections as
needed.
Parameters
file_ptr
: Auintptr_t
value representing the pointer to the image of the runtime loadable model. This parameter specifies the location of the model image to be installed.config
: A pointer to an ll_aton_reloc_config structure. This parameter provides the configuration details for how the model should be installed, including memory addresses and sizes.nn_instance
: A pointer to anNN_Instance_TypeDef
structure. This parameter is updated to handle the installed model, creating an instance of the neural network.
Return Value
- The function returns an integer value. A return value of
0
typically indicates success, while a nonzero value indicates that an error occurred during the installation process.
Steps Executed
- Checking step
- This step checks the compatibility of the binary object against
the runtime environment (static part of the firmware). The main
points checked include:
- The version and content of the binary file header.
- MCU type and whether the FPU (floating-point unit) is enabled (context/setting of the caller is used).
- Secure or nonsecure context.
- Whether the binary module has been compiled with the
LL_ATON_EB_DBG_INFO
orLL_ATON_RT_ASYNC
C-defines. - Version of the used LL_ATON files.
- This step checks the compatibility of the binary object against
the runtime environment (static part of the firmware). The main
points checked include:
- Memory-Pool initialization step
- If requested, this step initializes the used memory regions for
the given model. Specifically:
- If read/write memory pools handle the params/weights section, the associated memory region is initialized with values from the params/weights section.
- Optionally, if the
AI_RELOC_RT_LOAD_MODE_CLEAR
flag is set, the read/write memory region handling the activations is cleared.
- If requested, this step initializes the used memory regions for
the given model. Specifically:
- Code/Data installation and relocation step
- According to the
AI_RELOC_RT_LOAD_MODE_COPY
orAI_RELOC_RT_LOAD_MODE_XIP
flag:- The code/data sections are copied into the executable RAM region.
- The relocation process is performed to update references.
- Register the call-backs
- According to the
This function installs and creates an instance of a runtime
loadable model referenced by the file_ptr
parameter.
The config
parameter (of type ll_aton_reloc_config) indicates
how to install the model, and the nn_instance
(of type
NN_Instance_TypeDef
) is updated to handle the installed
model. This function executes the following steps:
ll_aton_reloc_config C-struct
The purpose of the ll_aton_reloc_config
C structure
is to provide the parameters which are requested to install a
runtime loadable model.
typedef struct _ll_aton_reloc_config {
uintptr_t exec_ram_addr; /* base@ of the exec memory region to place the relocatable code/data (8-Bytes aligned) */
uint32_t exec_ram_size; /* max size in byte of the exec memory region */
uintptr_t ext_ram_addr; /* base@ of the external memory region to place the external pool (if requested) */
size_t ext_ram_size; /* max size in byte of the external memory region */
uintptr_t ext_param_addr; /* base@ of the param memory region (if requested) */
uint32_t mode;
} ll_aton_reloc_config;
'exec_ram_addr'
/'exec_ram_size'
: These members indicate the base address (8-byte aligned) and the maximum size of the read/write executable RAM memory region. These parameters are mandatory. To determine the required size at runtime, the ll_aton_reloc_get_info function can be used.'ext_ram_addr'
/'ext_ram_addr'
: These members indicate the base address (8-byte aligned) and the maximum size of the read/write external RAM memory region, if requested. To determine the required size at runtime, the ll_aton_reloc_get_info function can be used.'ext_param_addr'
: This member indicates the base address (8-byte aligned) of the memory region containing the parameters/weights of the deployed model. This option is required when the –split option is used; otherwise, it must be set to NULL.'mode'
: This member indicates the expected execution mode. Or-ed flags can be used.AI_RELOC_RT_LOAD_MODE_XIP
orAI_RELOC_RT_LOAD_MODE_CLEAR
flag is mandatory.AI_RELOC_RT_LOAD_MODE_CLEAR
flag is optional.
mode | description |
---|---|
AI_RELOC_RT_LOAD_MODE_XIP |
XIP (Execute In Place) execution mode |
AI_RELOC_RT_LOAD_MODE_COPY |
COPY execution mode |
AI_RELOC_RT_LOAD_MODE_CLEAR |
Reset the used activation memory regions |
ll_aton_reloc_set_callbacks()
int ll_aton_reloc_set_callbacks(const NN_Instance_TypeDef *nn_instance, const struct ll_aton_reloc_callback *cbs)
Description
The ll_aton_reloc_set_callbacks
function is used to
overwrite the default registration of the callbacks done in the
ll_aton_reloc_install
function. This
function is optional.
Callback services | description |
---|---|
assert/lib error | to implement the management of the errors generated by the embedded LL ATON functions |
NPU/MCU cache maintenance operations | to implement the NPU/MCU cache maintenance operations |
LL_ATON_LIB_xxx | to implement the LL ATON LIB services to support the hybrid epochs |
Parameters
nn_instance
: A pointer to the neural network instance (NN_Instance_TypeDef
).cbs
: A pointer to all_aton_reloc_callback
structure (seell_aton_reloc_network.h
file).
Return Value
- The function returns an integer value. A return value of
0
typically indicates success, while a nonzero value indicates that an error occurred during the installation process.
ll_aton_reloc_get_info()
int ll_aton_reloc_get_info(const uintptr_t file_ptr, ll_aton_reloc_info *rt);
Description
The ll_aton_reloc_get_info
function is used to
obtain the main dimensioning information from the image of a runtime
loadable model. This information can include details such as the
size, memory requirements, and other relevant attributes of the
model. By providing a pointer to the model image and a reference to
an ll_aton_reloc_info
structure, users can retrieve and
store the necessary information to properly configure and manage the
runtime loadable model.
This function is particularly useful for setting up the memory regions and ensuring that the model can be correctly loaded and executed within the available resources.
Parameters
file_ptr
: Auintptr_t
value representing the pointer to the image of the runtime loadable model. This parameter specifies the location of the model image from which the information will be retrieved.rt
: A pointer to an ll_aton_reloc_info structure. This parameter is used to store the retrieved dimensioning information of the runtime loadable model. The function fills this structure with the relevant details.
Return Value
- The function returns an integer value. A return value of
0
typically indicates success, while a nonzero value indicates that an error occurred during the operation.
ll_aton_reloc_info C-struct
typedef struct _ll_aton_reloc_info
{
const char *c_name; /* c-name of the model */
uint32_t variant; /* 32-b word to handle the reloc rt version,
the used ARM Embedded compiler,
Cortex-Mx (CPUID) and if the FPU is requested */
uint32_t code_sz; /* size of the code (header + txt + rodata + data + got + rel sections) */
uint32_t params_sz; /* size (in bytes) of the weights */
uint32_t acts_sz; /* minimum requested RAM size (in bytes) for the activations buffer */
uint32_t ext_ram_sz; /* requested external ram size for the activations (and params) */
uint32_t rt_ram_xip; /* minimum requested RAM size to install it, XIP mode */
uint32_t rt_ram_copy; /* minimum requested RAM size to install it, COPY mode */
const char *rt_version_desc; /* rt description */
uint32_t rt_version; /* rt version */
uint32_t rt_version_extra; /* rt version extra */
} ll_aton_reloc_info;
member | description |
---|---|
c_name |
indicates the name of the model. |
variant |
or-red 32-bit value indicating the
used Arm compiler, CPUID of the Cortex®-M,.. (see
ll_aton_reloc_network.h file) |
code_sz |
size in bytes of all code/data sections representing the model: header+txt+rodata+data+got+rel sections |
params_sz |
total size (in bytes) of the params/weights section |
acts_sz |
total size (in bytes) of the activations |
ext_ram_sz |
requested size (in bytes) of the external RAM memory |
rt_ram_xip |
requested size (in bytes) of read/write execution memory region (XIP mode) |
rt_ram_copy |
requested size (in bytes) of read/write execution memory region (COPY mode) |
rt_version_desc |
(debug info) string describing the used LL runtime version |
rt_version |
LL runtime version:
major << 24 | minor << 16 | sub << 8 |
rt_version |
(debug info) extra dev version value |
ll_aton_reloc_get_mem_pool_desc()
*ll_aton_reloc_get_mem_pool_desc(const uintptr_t file_ptr, int index); ll_aton_reloc_mem_pool_desc
Description
The ll_aton_reloc_get_mem_pool_desc
function allows
to obtain information about the part of the memory-pools which are
used for a given model. By providing a pointer to the model image
and an index, users can retrieve the necessary information through
the returned ll_aton_reloc_mem_pool_desc
structure.
The ll_aton_reloc_get_mem_pool_desc
function allows
users to obtain information about parts of the memory pools used for
a given model. By providing a pointer to the model image and an
index, users can retrieve the necessary information through the
returned ll_aton_reloc_mem_pool_desc
structure.
Parameters
file_ptr
: Auintptr_t
value representing the pointer to the image of the runtime loadable model. This parameter specifies the location of the model image from which the information will be retrieved.index
: Index of the requested descriptor.
Return Value
- The function returns a reference of a
ll_aton_reloc_mem_pool_desc
object. If the specified index is out of range, the function may return NULL.
Example
A typical snippet of code to display memory pool C-descriptors.
*mem_c_desc;
ll_aton_reloc_mem_pool_desc int index = 0;
while ((mem_c_desc = ll_aton_reloc_get_mem_pool_desc((uintptr_t)bin, index)))
{
(" %d: flags=%x foff=%d dst=%x s=%d\n", index, mem_c_desc->flags,
printf->foff, mem_c_desc->dst, mem_c_desc->size);
mem_c_desc++;
index}
ll_aton_reloc_mem_pool_desc C-struct
typedef struct _ll_aton_reloc_mem_pool_desc
{
const char *name; /* name */
uint32_t flags; /* type definition: 32b:4x8b <type><data_type><reserved><id> */
uint32_t foff; /* offset in the binary file */
uint32_t dst; /* dst @ */
uint32_t size; /* real size */
} ll_aton_reloc_mem_pool_desc;
AI_RELOC_MPOOL_GET_XXX(flags)
macros (see
ll_aton_reloc_network.h
file) can be used to know the
attributes of the memory pool.
ll_aton_reloc_get_input/output_buffers_info()
const LL_Buffer_InfoTypeDef *ll_aton_reloc_get_input_buffers_info(const NN_Instance_TypeDef *nn_instance,
int32_t num);
const LL_Buffer_InfoTypeDef *ll_aton_reloc_get_output_buffers_info(const NN_Instance_TypeDef *nn_instance,
int32_t num);
Description
The ll_aton_reloc_get_input/output_buffers_info
function is used to obtain information about a specific input/output
buffer of a neural network instance. This can be useful for
understanding the structure and requirements of the input data for
the neural network. By providing the neural network instance and the
index of the desired input buffer, users can retrieve detailed
information about the buffer, such as its size, type, and memory
location.
Parameters
nn_instance
: A pointer to the neural network instance (NN_Instance_TypeDef
). This parameter specifies the neural network instance for which the input/output buffer information is to be retrieved.num
: An integer specifying the index of the input/output buffer whose description is to be retrieved. The index is zero-based, meaning thatnum = 0
refers to the first input buffer,num = 1
refers to the second input buffer, and so on.
Return Value
- The function returns a pointer to a
LL_Buffer_InfoTypeDef
structure, which contains the description of the specified input/output buffer. If the specified buffer index is out of range, the function may return NULL.
ll_aton_reloc_set_input/output()
(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer,
LL_ATON_User_IO_Result_t ll_aton_reloc_set_inputuint32_t size);
(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer,
LL_ATON_User_IO_Result_t ll_aton_reloc_set_outputuint32_t size);
Description
Both ll_aton_reloc_set_input
and
ll_aton_reloc_set_output
functions are used to
configure the address of the input and output buffers for a neural
network instance, respectively. By providing the neural network
instance, buffer index, buffer pointer, and buffer size, users can
set up the necessary memory regions for input and output data.
Warning
These functions should be only used when the deployed model is generated with the ‘–no-inputs-allocation’ or/and ‘–no-outputs-allocation’ respectively.
Parameters
nn_instance
: A pointer to the neural network instance (NN_Instance_TypeDef
). This parameter specifies the neural network instance for which the output buffer is to be set.num
: An unsigned integer specifying the index of the output buffer to be set. The index is zero-based.buffer
: A pointer to the buffer that will hold the output data. This parameter specifies the memory location where the output data is stored.size
: An unsigned integer specifying the size of the input/output buffer in bytes.
Return Value
- The function returns a value of type
LL_ATON_User_IO_Result_t
. This return value indicates the result of the operation, such as success or an error code.
ll_aton_reloc_get_input/output()
void *ll_aton_reloc_get_input(const NN_Instance_TypeDef *nn_instance, uint32_t num);
void *ll_aton_reloc_get_output(const NN_Instance_TypeDef *nn_instance, uint32_t num);
Description
Both ll_aton_reloc_get_input
and
ll_aton_reloc_get_output
functions are used to retrieve
pointers to the input and output buffers for a neural network
instance, respectively. By providing the neural network instance and
buffer index, users can obtain direct access to the memory regions
used for input and output data.
Warning
These functions should be only used when the deployed model is generated with the ‘–no-inputs-allocation’ or/and ‘–no-outputs-allocation’ respectively.
Parameters
nn_instance
: A pointer to the neural network instance (NN_Instance_TypeDef
). This parameter specifies the neural network instance for which the input/output buffer pointer is to be retrieved.num
: An unsigned integer specifying the index of the output buffer to be retrieved. The index is zero-based.
Return Value
- The function returns a pointer to the output buffer. If the
specified buffer index is out of range or an error occurs, the
function may return
NULL
.