Embedded ST Neural ART API
for STM32 target, based on ST Edge AI Core Technology 2.2.0
r1.1
Overview
This article describes the embedded inference client API and
associated runtime software stack. The C-application must use this
stack to utilize the specialized or configuration files generated by
the ST Neural ART
compiler. Referred to as the 'network.c'
file in
this article, it contains a series of steps (formally named epochs)
to perform inference using the ST
Neural-ART NPU as much as possible.
As descripted in the “ST Neural-ART NPU concepts” article (“Scheduling” section), each epoch is decomposed into three phases:
- Preparation of the epoch: pre-op phase
- Execution of the epoch: hw-op phase
- Cleanup of the epoch: post-op phase
The 'network.c'
file contains code that outlines the
actions to be performed in each step (epoch). To make this
configuration file useful in an embedded C project, a comprehensive
and efficient NPU runtime
software stack is provided. This stack referred to as
'll_aton'
, supports various platforms through a simple
OSAL/port layer and offers numerous options to optimize execution
for different use cases.
This article presents the ll_aton
API, and some of
the important options that can be used.
ll_aton source location
$STEDGEAI_CORE_DIR/Middlewares/ST/AI/Npu/ll_aton /* generic files */
$STEDGEAI_CORE_DIR/Middlewares/ST/AI/Npu/Devices/STM32N6XX /* STM32N6-specific files */
%STEDGEAI_CORE_DIR%
indicates the root location where the ST Edge AI Core components are installed.
Layered stack
The full stack presented here contains many layers, from the user point-of-view, only one include shall be used and this exposes the user-API. The contents of this reduced API will be showcased below.
#include "ll_aton_rt_user_api.h"
Model instance
For a given model, tracking the current status of an inference requires describing each network with an instance. This approach is crucial for efficiently managing use cases where multiple networks need to be executed.
A model instance has the following type:
typedef struct
{
const NN_Interface_TypeDef *network;
;
NN_Execution_State_TypeDef exec_state} NN_Instance_TypeDef;
// with
typedef struct
{
const char *network_name;
;
NN_EC_Hook_TypeDef ec_network_init;
NN_EC_Hook_TypeDef ec_inference_init;
NN_InputSetter_TypeDef input_setter;
NN_InputGetter_TypeDef input_getter;
NN_OutputSetter_TypeDef output_setter;
NN_OutputGetter_TypeDef output_getter;
NN_EpochBlockItems_TypeDef epoch_block_items;
NN_Buffers_Info_TypeDef output_buffers_info;
NN_Buffers_Info_TypeDef input_buffers_info;
NN_Buffers_Info_TypeDef internal_buffers_info} NN_Interface_TypeDef;
// and
typedef struct
{
const EpochBlock_ItemTypeDef *volatile current_epoch_block; // pointer to current epoch block
const EpochBlock_ItemTypeDef *volatile first_epoch_block; // pointer to first epoch block in current epoch list
const EpochBlock_ItemTypeDef *volatile next_epoch_block; // pointer to epoch block to be inserted
const EpochBlock_ItemTypeDef *volatile saved_current_epoch_block; // pointer to saved current epoch list
const EpochBlock_ItemTypeDef *volatile saved_first_epoch_block; // pointer to saved first epoch block in current epoch list
bool inference_started; // inference has been started
#if (LL_ATON_RT_MODE == LL_ATON_RT_ASYNC)
volatile uint32_t triggered_events; // currently triggered events/IRQs in current epoch
volatile bool current_epoch_block_started; // has current epoch block already been started
#endif // (LL_ATON_RT_MODE == LL_ATON_RT_ASYNC)
#ifndef NDEBUG
volatile uint32_t nr_of_epoch_blocks; // number of epoch blocks in network (includes also terminating empty epoch block)
volatile uint32_t saved_nr_of_epoch_blocks; // number of epoch blocks in saved network (includes also terminating empty epoch block)
#endif // NDEBUG
; // epoch callback function
TraceEpochBlock_FuncPtr_t epoch_callback_function
#if defined(LL_ATON_RT_RELOC)
uint32_t inst_reloc;
#endif
} NN_Execution_State_TypeDef;
In short, the instance struct is one of the main entrypoints to get information on a network and its current execution state.
Such structures are needed as arguments for most of the runtime-level functions.
How to instantiate a model
To instantiate a model, one must know the “name” of the network
to use. It has been either forced during compilation of the network
through the ST Neural-ART compiler or is “Default” by default. (In
the 'network.c'
file, some symbols are suffixed by the
name of the network, for example,
'LL_ATON_EpochBlockItems_Default'
describe epochs for
the “Default” network).
The use of the macros is as follows:
// "Default" should be replaced by the name of the network.
// Defines NN_Instance_Default and NN_Interface_Default with network.c informations
(Default) LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE
Runtime layer
As already presented, this layer provides the main functions to use to perform an inference.
LL_ATON_RT_RuntimeInit()
void LL_ATON_RT_RuntimeInit(void);
void LL_ATON_RT_RuntimeDeInit(void);
The init function shall always be called once whenever using this runtime, before performing any operation.
Mainly, this
- Initializes the ST Neural-ART NPU to a known-state
- Initializes the
ll_aton
runtime to a known-state- Calling initialization functions if an OS should be used
- Configuring IRQ handling correctly
As obvious as it may sound, the DeInit function can be used to clean up the initializations done above.
LL_ATON_RT_Init_Network()
void LL_ATON_RT_Init_Network(NN_Instance_TypeDef *nn_instance);
void LL_ATON_RT_DeInit_Network(NN_Instance_TypeDef *nn_instance);
This is again an initialization/deinitialization functions, this time used to set up the instance structures.
Here again, the init function shall be called before doing any inference to ensure that the instance is in a known state (this is the only goal of this function, make the instance clean and initialize all fields correctly). And the DeInit can be used to clear up everything done (for example, to make the instance unusable, unrunnable).
It is to be noted that the function
void LL_ATON_RT_Reset_Network(NN_Instance_TypeDef *nn_instance);
can be used to “rewind” an initialized instance. This ensures that
the instance is ready to perform a new inference.
LL_ATON_RT_RetValues_t LL_ATON_RT_RunEpochBlock()
(NN_Instance_TypeDef *nn_instance); LL_ATON_RT_RetValues_t LL_ATON_RT_RunEpochBlock
Whenever an instance has been initialized, it is possible to start performing an inference with it.
LL_ATON_RT_RunEpochBlock
is called to trigger
execution of the “next epoch” of the current instance.
That means that, to perform a full inference, it is needed to call
this function at least as much time as the number of epochs.
Whenever called, this function automatically updates the fields
of the instance structure passed as an argument.
It has two main roles:
- If the epoch is not started, it starts it
- If the epoch is already running, it returns the current epoch
execution status:
LL_ATON_RT_NO_WFE
: The next epoch block may be started:LL_ATON_OSAL_WFE()
is not called.LL_ATON_RT_WFE
: Epoch block is still running:LL_ATON_OSAL_WFE()
can be safely called.LL_ATON_RT_DONE
Inference is over.
LL_ATON_RT_SetNetworkCallback()
void LL_ATON_RT_SetEpochCallback(TraceEpochBlock_FuncPtr_t epoch_block_callback, NN_Instance_TypeDef *nn_instance); // will be deprecated
void LL_ATON_RT_SetNetworkCallback(NN_Instance_TypeDef *nn_instance, TraceEpochBlock_FuncPtr_t epoch_block_callback);
void LL_ATON_RT_SetRuntimeCallback(TraceRuntime_FuncPtr_t rt_callback);
The runtime environment provides a callback mechanism that allows to follow the setup and the execution of a network; the available callbacks are the following:
- runtime callback: when enabled it is called on each runtime initialization or deinitialization operation
- epoch callback: when enabled it is called on each network, initialization, deinitialization, or execution
The default values of callback function pointers are NULL and can be set using the following functions:
LL_ATON_RT_SetRuntimeCallback
LL_ATON_RT_SetNetworkCallback
(equivalent of theLL_ATON_RT_SetEpochCallback
function that will be shortly deprecated)
In particular, the epoch callback can be useful to follow and debug the execution steps of a neural network; for example, it is possible to measure epoch timing or to dump intermediate activations. The epoch call function is called with the following parameters:
ctype
: callback type. It can have the following values:LL_ATON_RT_Callbacktype_PRE_START
: first in-epoch execution step (called just before epoch preparation steps)LL_ATON_RT_Callbacktype_POST_START
: second in-epoch execution step (called just after epoch preparation steps)LL_ATON_RT_Callbacktype_PRE_END
: third in-epoch execution step (called just before epoch cleanup steps)LL_ATON_RT_Callbacktype_POST_END
: fourth (and last) in-epoch execution step (called just after epoch cleanup steps)LL_ATON_RT_Callbacktype_NN_Init
: network initialization or resetLL_ATON_RT_Callbacktype_NN_DeInit
: network deinitializationLL_ATON_RT_Callbacktype_RT_Init
: valid only for runtime callbackLL_ATON_RT_Callbacktype_RT_Deinit
: valid only for runtime callback
nn_instance
: instance of the networkeb
: current execution block info
The following code shows an example:
...
// Epoch callback setup
( EpochBlock_TraceCallBack , nn_instance_pnt ) ; // deprecated in future releases
LL_ATON_RT_SetEpochCallback ( nn_instance_pnt, EpochBlock_TraceCallBack ) ;
LL_ATON_RT_SetNetworkCallback ....
void EpochBlock_TraceCallBack ( LL_ATON_RT_Callbacktype_t ctype , const NN_Instance_TypeDef *nn_instance , const EpochBlock_ItemTypeDef *eb )
{
switch ( ctype ) {
case LL_ATON_RT_Callbacktype_PRE_START :
; // Trace epoch pin for oscilloscope
TOGGLE_EPOCH_PIN break ;
case LL_ATON_RT_Callbacktype_POST_END :
// If needed dump intermediate activations
( eb -> epoch_num ) ;
checkEpochOutputs break ;
case LL_ATON_RT_Callbacktype_NN_Init :
// Capture network start time
= dwtGetCycles() ;
inferenceBaseTime break ;
default :
return ;
break ;
} /* endswitch */
} // end of EpochBlock_TraceCallBack() function
void checkEpochOutputs ( uint32_t epochNum )
{
for ( int idx = 0 ; idx < bufferListlen ; idx++ ) {
if ( bufferList [ idx ] -> epoch == epochNum ) {
// print the activations info and/or the activation data related to the current epoch
// It is expected that bufferList and bufferListlen have been previously initialized
( bufferList [ idx ] ) ;
printBuffer } /* endif */
} /* endfor */
} // end of checkEpochOutputs() function
Warning
When using the epoch controller many or even all epochs are
executed in a single execution step: in this case the callback
function is not called on all epochs! If needed, the network can be
compiled (using the --ec-single-epoch
option) to
generate a single execution step for each epoch thus allowing to
follow the execution also with the epoch controller.
Input/Output buffers informations
const LL_Buffer_InfoTypeDef *LL_ATON_Output_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Input_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Internal_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
// And utilities:
unsigned char *LL_Buffer_addr_start(const LL_Buffer_InfoTypeDef *buf);
unsigned char *LL_Buffer_addr_end(const LL_Buffer_InfoTypeDef *buf);
uint32_t LL_Buffer_len(const LL_Buffer_InfoTypeDef *buf);
User-allocated Inputs/Outputs
// Inputs
(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
LL_ATON_User_IO_Result_t LL_ATON_Set_User_Input_Buffervoid *LL_ATON_Get_User_Input_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);
// Outputs
(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
LL_ATON_Set_User_Output_Buffervoid *LL_ATON_Get_User_Output_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);
LL_ATON_RT_Main()
void LL_ATON_RT_Main(NN_Instance_TypeDef *network_instance);
This function provides a template that shows the major steps to do whenever doing an inference. As stated, the functions presented above are the main ones to use to perform an inference, this template shows how to order them correctly.
A full execution template is also provided in this article, where the major steps are also found.
void LL_ATON_RT_Main(NN_Instance_TypeDef *network_instance)
{
;
LL_ATON_RT_RetValues_t ll_aton_rt_ret
/*** Start of user initialization code ***/
/*** End of user initialization code ***/
(network_instance != NULL);
assert(network_instance->network != NULL);
assert(); // Initialize runtime
LL_ATON_RT_RuntimeInit(network_instance); // Initialize passed network instance object
LL_ATON_RT_Init_Network
do
{
/* Execute first/next step of Cube.AI/ATON runtime */
= LL_ATON_RT_RunEpochBlock(network_instance);
ll_aton_rt_ret
/*** Start of user event handling code ***/
/*** End of user event handling code ***/
/* Wait for next event */
if (ll_aton_rt_ret == LL_ATON_RT_WFE)
{ /*** subject to change to fit also user code requirements ***/
();
LL_ATON_OSAL_WFE}
} while (ll_aton_rt_ret != LL_ATON_RT_DONE); /*** subject to change to fit also user code requirements ***/
(network_instance); // De-initialize the network instance object
LL_ATON_RT_DeInit_Network(); // De-initialize runtime
LL_ATON_RT_RuntimeDeInit
/*** Start of user de-initialization code ***/
/*** End of user de-initialization code ***/
}
Cache maintenance functions
As mentioned in the “ST Neural-ART NPU concepts” article, the system includes two data cache units:
- Microcontroller (MCU) data cache
- Neural Processor Unit (NPU) data cache
To deal with this, the NPU runtime provides the following cache maintenance functions. These functions should be used by the application before calling the inference to guarantee system memory coherency.
#include "ll_aton_caches_interface.h"
(...);
LL_ATON_Cache_MCU_Clean_Range(...);
LL_ATON_Cache_MCU_Invalidate_Range(...);
LL_ATON_Cache_MCU_Clean_Invalidate_Range(...);
LL_ATON_Cache_NPU_Clean_Range(...); LL_ATON_Cache_NPU_Clean_Invalidate_Range
Operating System Abstraction Layer (OSAL)
ll_aton
ships with interfaces for some RTOS, easing
the integration work when working in such environments.
The supported operating systems are:
- Bare Metal (no OS)
- FreeRTOS™
- ThreadX
More detail on how to set up a project in such an environment is provided in the appropriate section.
Software library interface
A library is delivered in the middleware directory of the tool. This library is known as EmbedNets. This is a purely optimized software library based on Arm Cortex®-M55 instructions and MVE extension. > REVIEW THAT
Some epochs cannot be executed by the ST Neural-ART NPU (for example, float-point operations), and there is a need to call for functions defined in EmbedNets.
The files ll_sw_float.c
and
ll_sw_integer.c
handles the interface with the software
library (optimized AI runtime library for Arm Corte-M family).
Stack configuration
The ST Neural-ART compiler is platform agnostic and the code
generated can then be executed with ll_aton
code. When
building a firmware for a given target, options (C-defines) must be
passed to ll_aton
to configure its behavior.
(See 'll_aton_config.h'
file)
Mandatory options
C-define | Description |
---|---|
LL_ATON_PLATFORM=LL_ATON_PLAT_STM32N6 |
Forces using of the implementation of
ll_aton specific for the STM32N6 |
LL_ATON_SW_FALLBACK |
Allows the use of software epochs (otherwise any network that requires software epochs will not compile) |
LL_ATON_RT_MODE=LL_ATON_RT_ASYNC |
Enables handling of events by
Neural-ART IRQ mechanism (the other acceptable value is
LL_ATON_RT_POLLING –IRQ not handled– but this is not
advised) |
LL_ATON_OSAL=LL_ATON_OSAL_BARE_METAL |
Enables a given OSAL (Operating System
Abstraction Layer), bare metal means “no OS”. (Other possible values
include LL_ATON_OSAL_THREADX ,
LL_ATON_OSAL_FREERTOS |
Useful options
C-define | Description |
---|---|
LL_ATON_DBG_BUFFER_INFO_EXCLUDED=1 |
Prevents shipping all buffers information with the final code (only I/O buffers are described, this makes the final firmware lighter) |
LL_ATON_EB_DBG_INFO |
Exports extra information in epoch blocks (only for debug: consumes space on the final firmware) |
LL_ATON_DUMP_DEBUG_API |
Exports functions to ease debug (for example, printing buffers through printf statements) |
LL_ATON_ENABLE_CLOCK_GATING=1 |
Enables clock gating (this is ON by
default: ll_aton powers on only parts of the Neural-ART
IP that are needed, at epoch level. This option can be set to 0 to
power-up everything, every time. Careful on the impact on power
consumption) |
Casual operations
Getting Input/Output buffers information
This operation is useful in many essential use cases:
- To set data in the input buffers before an inference, one must get the address of the input buffers.
- To get data from the output buffers after inference, being able to get the address of the buffers is a must-have.
The following functions should be used to perform those actions:
const LL_Buffer_InfoTypeDef *LL_ATON_Output_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Input_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Internal_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
// And utilities:
unsigned char *LL_Buffer_addr_start(const LL_Buffer_InfoTypeDef *buf);
unsigned char *LL_Buffer_addr_end(const LL_Buffer_InfoTypeDef *buf);
uint32_t LL_Buffer_len(const LL_Buffer_InfoTypeDef *buf);
All the LL_ATON_xxx_Buffers_Info
return a pointer to
an array of LL_Buffer_InfoTypedef
which can be explored
to retrieve characteristics of all the tensors used during the
inference.
For example, filling the first input tensor of any given model
with 0xFF
, this can be done with the following
snippet:
#define INPUT_NR 0
#define INPUT_VAL 0xFF
(Default)
LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE* inputs_ptr = LL_ATON_Input_Buffers_Info(&NN_Instance_Default); // Get a pointer to all input buffers info
LL_Buffer_InfoTypeDef* first_input_ptr = &(inputs_ptr[INPUT_NR]); // Get a pointer to the first buffer info
LL_Buffer_InfoTypeDef((void*)LL_Buffer_addr_start(first_input_ptr), INPUT_VAL, LL_Buffer_len(first_input_ptr)); memset
To retrieve the associated tensor dimensions, the following field
from the LL_Buffer_InfoTypeDef
C-struct should be used
(note that this information is only valid for the input/output
buffers). Other fields contain internal information generated by the
NPU compiler for debugging purposes.
typedef struct
{
const char *name; /**< Buffer name. NULL if end of list */
; /**< Buffer base address */
__LL_address_t addr_baseuint32_t offset_start; /**< Offset of the buffer start address from the base address */
uint32_t offset_end; /**< Offset of the buffer end address from the base address
* (first bytes address beyond buffer length) */
...
const uint32_t *mem_shape; /**< shape as seen by the user in memory (only valid for input/output buffers) */
uint16_t mem_ndims; /**< Number of dimensions of mem_shape (Length of mem_shape) */
...
} LL_Buffer_InfoTypeDef;
Connecting I/O buffers to the network
By default, the input and output buffers are allocated by the NPU
compiler in the provided memory pools.
As shown in the
compiler description and its interaction with STEdgeAI, it is
possible to force the compiler not to generate such buffers, and let
the user allocate them. --no-inputs-allocation
(resp.
--no-outputs-allocation
) let the compiler not allocate
input buffers (resp. output buffers).
When this behavior is used, the generated network.c
contains functions to be used to “connect” the user-allocated
buffers to the network.
The use of those functions is as follows (for a case where only
--no-inputs-allocation
has been used for example)
The following functions are of use when handling user-allocated input/output buffers:
// Inputs
(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
LL_ATON_User_IO_Result_t LL_ATON_Set_User_Input_Buffervoid *LL_ATON_Get_User_Input_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);
// Outputs
(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
LL_ATON_Set_User_Output_Buffervoid *LL_ATON_Get_User_Output_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);
The LL_ATON_Set_User_xxx_Buffer
functions are used
to connect the user-allocated buffer (3rd argument) to the
num-th (2nd argument) input/output of the network instance
given as the first argument. The size argument is mandatory and
should match exactly the expected size (in bytes) of the given
tensor.
The LL_ATON_Get_User_xxx_Buffer
are used the other
way around, to retrieve the address of the num-th (2nd
argument) input/output.
For example, to connect an input buffer to a network (named Default below), the following snippet can be used:
(Default)
LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE// Define buffers for Inputs (the input is an RGB image, 256 x 256 x 3 x 8 bits)
uint8_t input_buff[256 * 256 * 3];
// Connect buffer to the network first input.
= LL_ATON_Set_User_Input_Buffer(&NN_Instance_Default, 0, input_buff, 256 * 256 * 3);
res (res == LL_ATON_User_IO_NOERROR); assert
The user is responsible for allocating correctly the buffers:
there is no way for those functions to know if the buffers are
correctly allocated, except by trusting the user for the arguments
provided.
The types and the sizes of the buffers must match what the model
expects.
Using a Real-Time Operating System (RTOS)
ll_aton
comes in RTOS-flavored versions that can be
activated through LL_ATON_OSAL
c-define.
When using an RTOS, the ll_aton
is mainly responsible
for protecting the ST Neural-ART NPU from concurrent accesses that
may cause issues (for example, if two threads are running an
inference). This is done by locking and releasing shared resources
using features of the OS (mutexes).
The two main supported RTOS are:
- ThreadX / Azure RTOS / Eclipse ThreadX -
LL_ATON_OSAL_THREADX
- FreeRTOS™ -
LL_ATON_OSAL_FREERTOS
In all cases, the ll_aton
:
- Adds OS-specific initialization routines (automatically called
by
LL_ATON_RT_RuntimeInit
) - Redefines Wait For Event/Event generation using semaphores
- Adds specific toolings to handle multiple threads running inferences in parallel
- Tinker correct code to make interrupt work correctly with the OS
Minimal execution unit
When using an operating system, threads and epochs are scheduled according to the priorities and to the timing: the runtime ensures that the following minimal execution units are not interrupted:
- epochs
- When using epoch-controller: blobs. When using the epoch controller one or more epochs are packed into a “blob”, a set of instruction scheduled by the epoch controller. If all epochs of a network have been packed into a single blob the network will be executed in one shot.
ThreadX
To enable the use of ThreadX:
- Add ThreadX to the project
- Add
ll_aton_osal_threadx.c
to the project - Define
LL_ATON_OSAL=LL_ATON_OSAL_THREADX
- Define
TX_HAS_PARALLEL_NETWORKS
symbol:TX_HAS_PARALLEL_NETWORKS=0
if only one thread is running an inference on the NPUTX_HAS_PARALLEL_NETWORKS=1
if multiple threads are running an inference
- Ensure
LL_ATON_RT_MODE=LL_ATON_RT_ASYNC
is set - Use the LL_ATON_OSAL_WFE macro in place of the _WFE in the network execution loop
- Use networks with separated memory spaces
FreeRTOS™
To enable the use of FreeRTOS:
- Add FreeRTOS™ suite to the project
- Add
ll_aton_osal_freertos.c
to the project - Define
LL_ATON_OSAL=LL_ATON_OSAL_FREERTOS
- Define
TX_HAS_PARALLEL_NETWORKS
symbol:FREERTOS_HAS_PARALLEL_NETWORKS=0
if only one thread is running an inference on the NPUFREERTOS_HAS_PARALLEL_NETWORKS=1
if multiple threads are running an inference
- Ensure
LL_ATON_RT_MODE=LL_ATON_RT_ASYNC
is set - Use the LL_ATON_OSAL_WFE macro in place of the _WFE in the network execution loop
- Use networks with separated memory spaces
Global tips
- It is possible to implement one’s own port for a given OS, by
using
LL_ATON_OSAL=LL_ATON_OSAL_USER_IMPL
and creating a filell_aton_osal_user_impl.h
containing the custom implementations needed byll_aton
. The macros to be defined are listed inll_aton_osal.h
. - Careful when doing multiple inferences in parallel to ensure that activation-buffers from an ongoing network inference do not interfere with other activation-buffers from another ongoing network inference
Example
Below is an example for FreeRTOS™, running two networks in two threads.
/* Define mutex and threads attributes */
;
osMutexId_t mutex_id
const osMutexAttr_t mutex_attr = {
"myMutex", // name of the mutex
0, // attr_bits
, // memory for control block
NULL0 // size for control block
} ;
;
osThreadId_t Thread1Handleconst osThreadAttr_t Thread1_attributes = {
.name = "A",
.priority = (osPriority_t) osPriorityNormal,
.stack_size = 256 * 8
};
;
osThreadId_t Thread2Handleconst osThreadAttr_t Thread2_attributes = {
.name = "B",
.priority = (osPriority_t) osPriorityNormal,
.stack_size = 256 * 8
};
// Main thread
void mainthread_func(void* arg)
{
= osMutexNew ( &mutex_attr ) ;
mutex_id
if ( mutex_id == NULL ) {
while ( 1 ) ;
}
= osThreadNew ( thread1_func , NULL , &Thread1_attributes ) ;
Thread1Handle = osThreadNew ( thread2_func , NULL , &Thread2_attributes ) ;
Thread2Handle
while ( 1 ) ;
}
// thread 1: intialize and handle atonn_mnist network
void thread1_func(void* arg)
{
int testok ;
void* output_buffer ;
void* input_buffer ;
int output_buffer_len ;
int input_buffer_len ;
;
LL_ATON_RT_RetValues_t ll_aton_rt_ret
( atonn_mnist ) ; // Network instance
LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE
/* Get Input/Output buffers */
= (void*)(inbufs[0].addr_base.i + inbufs[0].offset_start);
input_buffer = inbufs[0].offset_end - outbufs[0].offset_start;
input_buffer_len = (void*)(outbufs[0].addr_base.i + outbufs[0].offset_start);
output_buffer = outbufs[0].offset_end - outbufs[0].offset_start;
output_buffer_len
( &NN_Instance_atonn_mnist ) ;
LL_ATON_RT_Init_Network
= 1 ;
testok
while ( testok ) {
( input_buffer , input_buffer_len ) ;
get_atonn_mnist_features
( &NN_Instance_atonn_mnist ) ;
LL_ATON_RT_Reset_Network
do {
= LL_ATON_RT_RunEpochBlock ( &NN_Instance_atonn_mnist ) ;
ll_aton_rt_ret
if ( ll_aton_rt_ret == LL_ATON_RT_WFE ) {
();
LL_ATON_OSAL_WFE}
} while ( ll_aton_rt_ret != LL_ATON_RT_DONE ) ;
( output_buffer , output_buffer_len ) ;
post_process_atonn_mnist_inference }
}
void thread2_func(void* arg)
{
int testok ;
void* output_buffer ;
void* input_buffer ;
int output_buffer_len ;
int input_buffer_len ;
;
LL_ATON_RT_RetValues_t ll_aton_rt_ret
( atonn_mnist5x5 ) ; // Network instance
LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE
/* Get Input/Output buffers */
const LL_Buffer_InfoTypeDef *inbufs = LL_ATON_Input_Buffers_Info_atonn_mnist5x5();
const LL_Buffer_InfoTypeDef *outbufs = LL_ATON_Output_Buffers_Info_atonn_mnist5x5();
= (void*)(inbufs[0].addr_base.i + inbufs[0].offset_start);
input_buffer = inbufs[0].offset_end - outbufs[0].offset_start;
input_buffer_len = (void*)(outbufs[0].addr_base.i + outbufs[0].offset_start);
output_buffer = outbufs[0].offset_end - outbufs[0].offset_start;
output_buffer_len
( &NN_Instance_atonn_mnist5x5 ) ;
LL_ATON_RT_Init_Network
= 1 ;
testok
while ( testok ) {
( input_buffer , input_buffer_len ) ;
get_atonn_mnist5x5_features
( &NN_Instance_atonn_mnist5x5 ) ;
LL_ATON_RT_Reset_Network
do {
= LL_ATON_RT_RunEpochBlock ( &NN_Instance_atonn_mnist5x5 ) ;
ll_aton_rt_ret
if ( ll_aton_rt_ret == LL_ATON_RT_WFE ) {
();
LL_ATON_OSAL_WFE}
} while ( ll_aton_rt_ret != LL_ATON_RT_DONE ) ;
( output_buffer , output_buffer_len ) ;
post_process_atonn_mnist5x5_inference }
}
All reference to not-reentrant function or to shared resources need to be protected with mutex:
( mutex_id , osWaitForever ) ;
osMutexAcquire ....
( mutex_id ) ; osMutexRelease
Full simple code example for running an inference, using the “ll_aton” runtime API
#include "ll_aton_runtime.h"
#include "ll_aton_caches_interface.h"
// Network instance declaration
( Default ) ;
LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE
*NN_InterfacePnt ;
NN_Interface_TypeDef *NN_InstancePnt ;
NN_Instance_TypeDef
void* input_start_address ;
void* input_end_address ;
void* output_start_address ;
void* output_end_address ;
void main ( void )
{
() ; // system initialization
applicationSetup
() ;
networkSetup
// Enter inference loop
while ( 1 ) {
( input_start_address ) ; // Wait for input data from somewhere
getFeature
( input_start_address ) ; // Feature pre-processing (if any)
featurePreProcess
( NN_InstancePnt ) ;
inferenceExecute
( output_start_address ) ; // Inference post-processing (if any)
inferencePostProcess
} /* endwhile */
}
void networkSetup ( void )
{
() ; // Initialize runtime
LL_ATON_RT_RuntimeInit
// Set references to the network "Default"
= (NN_Interface_TypeDef *) &NN_Interface_Default ;
NN_InterfacePnt = (NN_Instance_TypeDef *) &NN_Instance_Default ;
NN_InstancePnt
// Initialize network
( NN_InstancePnt ) ;
LL_ATON_RT_Init_Network
// Set call back (if any)
//LL_ATON_RT_SetEpochCallback ( EpochBlock_TraceCallBack , NN_InstancePnt ) ; // deprecated in future releases
( NN_InstancePnt, EpochBlock_TraceCallBack) ;
LL_ATON_RT_SetNetworkCallback
// Get I/O buffers addresses
const LL_Buffer_InfoTypeDef * buffersInfos = NN_InterfacePnt -> input_buffers_info();
= LL_Buffer_addr_start(&buffersInfos[0]);
input_start_address = LL_Buffer_addr_end(&buffersInfos[0]);
input_end_address
= NN_InterfacePnt -> output_buffers_info();
buffersInfos
= LL_Buffer_addr_start(&buffersInfos[0]);
output_start_address = LL_Buffer_addr_end(&buffersInfos[0]);
output_end_address }
int inferenceExecute ( NN_Instance_TypeDef *networkInstancePnt )
{
;
LL_ATON_RT_RetValues_t ll_aton_rt_ret
if ( networkInstancePnt == NULL ) {
return ( -1 ) ;
} /* endif */
(); /* if NPU cache is used */
LL_ATON_Cache_NPU_Invalidate(input_start_address, input_end_address - input_start_address);
LL_ATON_Cache_MCU_Clean_Invalidate_Range(output_start_address, output_end_address - output_start_address);
LL_ATON_Cache_MCU_Invalidate_Range
(networkInstancePnt); // Reset the network instance object
LL_ATON_RT_Reset_Network
do
{
/* Execute first/next step of Cube.AI/ATON runtime */
= LL_ATON_RT_RunEpochBlock(networkInstancePnt);
ll_aton_rt_ret
/* Wait for next event */
if (ll_aton_rt_ret == LL_ATON_RT_WFE)
{ /*** subject to change to fit also user code requirements ***/
();
LL_ATON_OSAL_WFE}
} while (ll_aton_rt_ret != LL_ATON_RT_DONE);
}
Encryption using the API
Details about how to achieve / use encryption can be found in the encryption article.