Embedded ST Neural ART API

for STM32 target, based on ST Edge AI Core Technology 2.2.0

r1.1

Overview

This article describes the embedded inference client API and associated runtime software stack. The C-application must use this stack to utilize the specialized or configuration files generated by the ST Neural ART compiler. Referred to as the 'network.c' file in this article, it contains a series of steps (formally named epochs) to perform inference using the ST Neural-ART NPU as much as possible.

As descripted in the “ST Neural-ART NPU concepts” article (“Scheduling” section), each epoch is decomposed into three phases:

Preparation of the epoch: pre-op phase
Execution of the epoch: hw-op phase
Cleanup of the epoch: post-op phase

The 'network.c' file contains code that outlines the actions to be performed in each step (epoch). To make this configuration file useful in an embedded C project, a comprehensive and efficient NPU runtime software stack is provided. This stack referred to as 'll_aton', supports various platforms through a simple OSAL/port layer and offers numerous options to optimize execution for different use cases.

This article presents the ll_aton API, and some of the important options that can be used.

ll_aton source location

$STEDGEAI_CORE_DIR/Middlewares/ST/AI/Npu/ll_aton             /* generic files */
$STEDGEAI_CORE_DIR/Middlewares/ST/AI/Npu/Devices/STM32N6XX   /* STM32N6-specific files */

%STEDGEAI_CORE_DIR% indicates the root location where the ST Edge AI Core components are installed.

Layered stack

The full stack presented here contains many layers, from the user point-of-view, only one include shall be used and this exposes the user-API. The contents of this reduced API will be showcased below.

#include "ll_aton_rt_user_api.h"

Model instance

For a given model, tracking the current status of an inference requires describing each network with an instance. This approach is crucial for efficiently managing use cases where multiple networks need to be executed.

A model instance has the following type:

typedef struct
{
  const NN_Interface_TypeDef *network;
  NN_Execution_State_TypeDef exec_state;
} NN_Instance_TypeDef;

// with
typedef struct
{
  const char *network_name;
  NN_EC_Hook_TypeDef ec_network_init;
  NN_EC_Hook_TypeDef ec_inference_init;
  NN_InputSetter_TypeDef input_setter;
  NN_InputGetter_TypeDef input_getter;
  NN_OutputSetter_TypeDef output_setter;
  NN_OutputGetter_TypeDef output_getter;
  NN_EpochBlockItems_TypeDef epoch_block_items;
  NN_Buffers_Info_TypeDef output_buffers_info;
  NN_Buffers_Info_TypeDef input_buffers_info;
  NN_Buffers_Info_TypeDef internal_buffers_info;
} NN_Interface_TypeDef;

// and
typedef struct
{
  const EpochBlock_ItemTypeDef *volatile current_epoch_block; // pointer to current epoch block
  const EpochBlock_ItemTypeDef *volatile first_epoch_block;   // pointer to first epoch block in current epoch list
  const EpochBlock_ItemTypeDef *volatile next_epoch_block;    // pointer to epoch block to be inserted

  const EpochBlock_ItemTypeDef *volatile saved_current_epoch_block; // pointer to saved current epoch list
  const EpochBlock_ItemTypeDef *volatile saved_first_epoch_block;   // pointer to saved first epoch block in current epoch list

  bool inference_started; // inference has been started

#if (LL_ATON_RT_MODE == LL_ATON_RT_ASYNC)
  volatile uint32_t triggered_events;        // currently triggered events/IRQs in current epoch
  volatile bool current_epoch_block_started; // has current epoch block already been started
#endif                                         // (LL_ATON_RT_MODE == LL_ATON_RT_ASYNC)

#ifndef NDEBUG
  volatile uint32_t nr_of_epoch_blocks; // number of epoch blocks in network (includes also terminating empty epoch block)
  volatile uint32_t saved_nr_of_epoch_blocks; // number of epoch blocks in saved network (includes also terminating empty epoch block)
#endif                                          // NDEBUG

  TraceEpochBlock_FuncPtr_t epoch_callback_function; // epoch callback function

#if defined(LL_ATON_RT_RELOC)
  uint32_t inst_reloc;
#endif

} NN_Execution_State_TypeDef;

In short, the instance struct is one of the main entrypoints to get information on a network and its current execution state.

Such structures are needed as arguments for most of the runtime-level functions.

How to instantiate a model

To instantiate a model, one must know the “name” of the network to use. It has been either forced during compilation of the network through the ST Neural-ART compiler or is “Default” by default. (In the 'network.c' file, some symbols are suffixed by the name of the network, for example, 'LL_ATON_EpochBlockItems_Default' describe epochs for the “Default” network).

The use of the macros is as follows:

  // "Default" should be replaced by the name of the network.
  // Defines NN_Instance_Default and NN_Interface_Default with network.c informations
  LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE(Default)

Runtime layer

As already presented, this layer provides the main functions to use to perform an inference.

LL_ATON_RT_RuntimeInit()

void LL_ATON_RT_RuntimeInit(void);
void LL_ATON_RT_RuntimeDeInit(void);

The init function shall always be called once whenever using this runtime, before performing any operation.

Mainly, this

Initializes the ST Neural-ART NPU to a known-state
Initializes the ll_aton runtime to a known-state
- Calling initialization functions if an OS should be used
- Configuring IRQ handling correctly

As obvious as it may sound, the DeInit function can be used to clean up the initializations done above.

LL_ATON_RT_Init_Network()

void LL_ATON_RT_Init_Network(NN_Instance_TypeDef *nn_instance);
void LL_ATON_RT_DeInit_Network(NN_Instance_TypeDef *nn_instance);

This is again an initialization/deinitialization functions, this time used to set up the instance structures.

Here again, the init function shall be called before doing any inference to ensure that the instance is in a known state (this is the only goal of this function, make the instance clean and initialize all fields correctly). And the DeInit can be used to clear up everything done (for example, to make the instance unusable, unrunnable).

It is to be noted that the function void LL_ATON_RT_Reset_Network(NN_Instance_TypeDef *nn_instance); can be used to “rewind” an initialized instance. This ensures that the instance is ready to perform a new inference.

LL_ATON_RT_RetValues_t LL_ATON_RT_RunEpochBlock()

LL_ATON_RT_RetValues_t LL_ATON_RT_RunEpochBlock(NN_Instance_TypeDef *nn_instance);

Whenever an instance has been initialized, it is possible to start performing an inference with it.

LL_ATON_RT_RunEpochBlock is called to trigger execution of the “next epoch” of the current instance.
That means that, to perform a full inference, it is needed to call this function at least as much time as the number of epochs.

Whenever called, this function automatically updates the fields of the instance structure passed as an argument.
It has two main roles:

If the epoch is not started, it starts it
If the epoch is already running, it returns the current epoch execution status:
- LL_ATON_RT_NO_WFE: The next epoch block may be started: LL_ATON_OSAL_WFE() is not called.
- LL_ATON_RT_WFE: Epoch block is still running: LL_ATON_OSAL_WFE() can be safely called.
- LL_ATON_RT_DONE Inference is over.

LL_ATON_RT_SetNetworkCallback()

void LL_ATON_RT_SetEpochCallback(TraceEpochBlock_FuncPtr_t epoch_block_callback, NN_Instance_TypeDef *nn_instance);   // will be deprecated
void LL_ATON_RT_SetNetworkCallback(NN_Instance_TypeDef *nn_instance, TraceEpochBlock_FuncPtr_t epoch_block_callback);
void LL_ATON_RT_SetRuntimeCallback(TraceRuntime_FuncPtr_t rt_callback);

The runtime environment provides a callback mechanism that allows to follow the setup and the execution of a network; the available callbacks are the following:

runtime callback: when enabled it is called on each runtime initialization or deinitialization operation
epoch callback: when enabled it is called on each network, initialization, deinitialization, or execution

The default values of callback function pointers are NULL and can be set using the following functions:

LL_ATON_RT_SetRuntimeCallback
LL_ATON_RT_SetNetworkCallback (equivalent of the LL_ATON_RT_SetEpochCallback function that will be shortly deprecated)

In particular, the epoch callback can be useful to follow and debug the execution steps of a neural network; for example, it is possible to measure epoch timing or to dump intermediate activations. The epoch call function is called with the following parameters:

ctype: callback type. It can have the following values:
- LL_ATON_RT_Callbacktype_PRE_START: first in-epoch execution step (called just before epoch preparation steps)
- LL_ATON_RT_Callbacktype_POST_START: second in-epoch execution step (called just after epoch preparation steps)
- LL_ATON_RT_Callbacktype_PRE_END: third in-epoch execution step (called just before epoch cleanup steps)
- LL_ATON_RT_Callbacktype_POST_END: fourth (and last) in-epoch execution step (called just after epoch cleanup steps)
- LL_ATON_RT_Callbacktype_NN_Init: network initialization or reset
- LL_ATON_RT_Callbacktype_NN_DeInit: network deinitialization
- LL_ATON_RT_Callbacktype_RT_Init: valid only for runtime callback
- LL_ATON_RT_Callbacktype_RT_Deinit: valid only for runtime callback
nn_instance: instance of the network
eb: current execution block info

The following code shows an example:

  ...
  // Epoch callback setup
  LL_ATON_RT_SetEpochCallback ( EpochBlock_TraceCallBack , nn_instance_pnt ) ; // deprecated in future releases
  LL_ATON_RT_SetNetworkCallback ( nn_instance_pnt, EpochBlock_TraceCallBack ) ;
....


void EpochBlock_TraceCallBack ( LL_ATON_RT_Callbacktype_t ctype , const NN_Instance_TypeDef *nn_instance , const EpochBlock_ItemTypeDef *eb )
{
  switch ( ctype ) {
           case LL_ATON_RT_Callbacktype_PRE_START :
                TOGGLE_EPOCH_PIN ; // Trace epoch pin for oscilloscope
           break ;
           case LL_ATON_RT_Callbacktype_POST_END :

                // If needed dump intermediate activations
                checkEpochOutputs ( eb -> epoch_num ) ;
           break ;
           case LL_ATON_RT_Callbacktype_NN_Init :

                // Capture network start time
                inferenceBaseTime = dwtGetCycles() ;
           break ;
           default :
                return ;
           break ;
  } /* endswitch */
} // end of EpochBlock_TraceCallBack() function

void checkEpochOutputs ( uint32_t epochNum )
{
  for ( int idx = 0 ; idx < bufferListlen ; idx++ ) {

        if ( bufferList [ idx ] -> epoch == epochNum ) {
             // print the activations info and/or the activation data related to the current epoch
             // It is expected that bufferList and bufferListlen have been previously initialized
             printBuffer ( bufferList [ idx ]  ) ;
        } /* endif */
  } /* endfor */
  
} // end of checkEpochOutputs() function

Warning

When using the epoch controller many or even all epochs are executed in a single execution step: in this case the callback function is not called on all epochs! If needed, the network can be compiled (using the --ec-single-epoch option) to generate a single execution step for each epoch thus allowing to follow the execution also with the epoch controller.

Input/Output buffers informations

const LL_Buffer_InfoTypeDef *LL_ATON_Output_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Input_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Internal_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
// And utilities:
unsigned char *LL_Buffer_addr_start(const LL_Buffer_InfoTypeDef *buf);
unsigned char *LL_Buffer_addr_end(const LL_Buffer_InfoTypeDef *buf);
uint32_t LL_Buffer_len(const LL_Buffer_InfoTypeDef *buf);

See Getting Input/Output buffers information section below

User-allocated Inputs/Outputs

// Inputs
LL_ATON_User_IO_Result_t LL_ATON_Set_User_Input_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
void *LL_ATON_Get_User_Input_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);
// Outputs
LL_ATON_Set_User_Output_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
void *LL_ATON_Get_User_Output_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);

See Connecting I/O buffers to the network section below

LL_ATON_RT_Main()

void LL_ATON_RT_Main(NN_Instance_TypeDef *network_instance);

This function provides a template that shows the major steps to do whenever doing an inference. As stated, the functions presented above are the main ones to use to perform an inference, this template shows how to order them correctly.

A full execution template is also provided in this article, where the major steps are also found.

void LL_ATON_RT_Main(NN_Instance_TypeDef *network_instance)
{
  LL_ATON_RT_RetValues_t ll_aton_rt_ret;

  /*** Start of user initialization code ***/

  /*** End of user initialization code ***/

  assert(network_instance != NULL);
  assert(network_instance->network != NULL);
  LL_ATON_RT_RuntimeInit();                  // Initialize runtime
  LL_ATON_RT_Init_Network(network_instance); // Initialize passed network instance object

  do
  {
    /* Execute first/next step of Cube.AI/ATON runtime */
    ll_aton_rt_ret = LL_ATON_RT_RunEpochBlock(network_instance);

    /*** Start of user event handling code ***/

    /*** End of user event handling code ***/

    /* Wait for next event */
    if (ll_aton_rt_ret == LL_ATON_RT_WFE)
    { /*** subject to change to fit also user code requirements ***/
      LL_ATON_OSAL_WFE();
    }
  } while (ll_aton_rt_ret != LL_ATON_RT_DONE); /*** subject to change to fit also user code requirements ***/

  LL_ATON_RT_DeInit_Network(network_instance); // De-initialize the network instance object
  LL_ATON_RT_RuntimeDeInit();                  // De-initialize runtime

  /*** Start of user de-initialization code ***/

  /*** End of user de-initialization code ***/
}

Cache maintenance functions

As mentioned in the “ST Neural-ART NPU concepts” article, the system includes two data cache units:

Microcontroller (MCU) data cache
Neural Processor Unit (NPU) data cache

To deal with this, the NPU runtime provides the following cache maintenance functions. These functions should be used by the application before calling the inference to guarantee system memory coherency.

#include "ll_aton_caches_interface.h"

LL_ATON_Cache_MCU_Clean_Range(...);
LL_ATON_Cache_MCU_Invalidate_Range(...);
LL_ATON_Cache_MCU_Clean_Invalidate_Range(...);
LL_ATON_Cache_NPU_Clean_Range(...);
LL_ATON_Cache_NPU_Clean_Invalidate_Range(...);

Operating System Abstraction Layer (OSAL)

ll_aton ships with interfaces for some RTOS, easing the integration work when working in such environments.
The supported operating systems are:

Bare Metal (no OS)
FreeRTOS™
ThreadX

More detail on how to set up a project in such an environment is provided in the appropriate section.

Software library interface

A library is delivered in the middleware directory of the tool. This library is known as EmbedNets. This is a purely optimized software library based on Arm Cortex®-M55 instructions and MVE extension. > REVIEW THAT

Some epochs cannot be executed by the ST Neural-ART NPU (for example, float-point operations), and there is a need to call for functions defined in EmbedNets.

The files ll_sw_float.c and ll_sw_integer.c handles the interface with the software library (optimized AI runtime library for Arm Corte-M family).

Stack configuration

The ST Neural-ART compiler is platform agnostic and the code generated can then be executed with ll_aton code. When building a firmware for a given target, options (C-defines) must be passed to ll_aton to configure its behavior.

(See 'll_aton_config.h' file)

Mandatory options

C-define	Description
`LL_ATON_PLATFORM=LL_ATON_PLAT_STM32N6`	Forces using of the implementation of `ll_aton` specific for the STM32N6
`LL_ATON_SW_FALLBACK`	Allows the use of software epochs (otherwise any network that requires software epochs will not compile)
`LL_ATON_RT_MODE=LL_ATON_RT_ASYNC`	Enables handling of events by Neural-ART IRQ mechanism (the other acceptable value is `LL_ATON_RT_POLLING` –IRQ not handled– but this is not advised)
`LL_ATON_OSAL=LL_ATON_OSAL_BARE_METAL`	Enables a given OSAL (Operating System Abstraction Layer), bare metal means “no OS”. (Other possible values include `LL_ATON_OSAL_THREADX`, `LL_ATON_OSAL_FREERTOS`

Useful options

C-define	Description
`LL_ATON_DBG_BUFFER_INFO_EXCLUDED=1`	Prevents shipping all buffers information with the final code (only I/O buffers are described, this makes the final firmware lighter)
`LL_ATON_EB_DBG_INFO`	Exports extra information in epoch blocks (only for debug: consumes space on the final firmware)
`LL_ATON_DUMP_DEBUG_API`	Exports functions to ease debug (for example, printing buffers through printf statements)
`LL_ATON_ENABLE_CLOCK_GATING=1`	Enables clock gating (this is ON by default: `ll_aton` powers on only parts of the Neural-ART IP that are needed, at epoch level. This option can be set to 0 to power-up everything, every time. Careful on the impact on power consumption)

Casual operations

Getting Input/Output buffers information

This operation is useful in many essential use cases:

To set data in the input buffers before an inference, one must get the address of the input buffers.
To get data from the output buffers after inference, being able to get the address of the buffers is a must-have.

The following functions should be used to perform those actions:

const LL_Buffer_InfoTypeDef *LL_ATON_Output_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Input_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
const LL_Buffer_InfoTypeDef *LL_ATON_Internal_Buffers_Info(const NN_Instance_TypeDef *nn_instance);
// And utilities:
unsigned char *LL_Buffer_addr_start(const LL_Buffer_InfoTypeDef *buf);
unsigned char *LL_Buffer_addr_end(const LL_Buffer_InfoTypeDef *buf);
uint32_t LL_Buffer_len(const LL_Buffer_InfoTypeDef *buf);

All the LL_ATON_xxx_Buffers_Info return a pointer to an array of LL_Buffer_InfoTypedef which can be explored to retrieve characteristics of all the tensors used during the inference.

For example, filling the first input tensor of any given model with 0xFF, this can be done with the following snippet:

#define INPUT_NR 0
#define INPUT_VAL 0xFF
LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE(Default) 
LL_Buffer_InfoTypeDef* inputs_ptr = LL_ATON_Input_Buffers_Info(&NN_Instance_Default); // Get a pointer to all input buffers info
LL_Buffer_InfoTypeDef* first_input_ptr = &(inputs_ptr[INPUT_NR]);  // Get a pointer to the first buffer info
memset((void*)LL_Buffer_addr_start(first_input_ptr), INPUT_VAL, LL_Buffer_len(first_input_ptr));

To retrieve the associated tensor dimensions, the following field from the LL_Buffer_InfoTypeDef C-struct should be used (note that this information is only valid for the input/output buffers). Other fields contain internal information generated by the NPU compiler for debugging purposes.

typedef struct
{
  const char *name;             /**< Buffer name. NULL if end of list */
  __LL_address_t addr_base;     /**< Buffer base address */
  uint32_t offset_start;        /**< Offset of the buffer start address from the base address */
  uint32_t offset_end;          /**< Offset of the buffer end address from the base address
                                   *   (first bytes address beyond buffer length) */
  ...
  const uint32_t *mem_shape;    /**< shape as seen by the user in memory (only valid for input/output buffers) */
  uint16_t mem_ndims;           /**< Number of dimensions of mem_shape (Length of mem_shape) */
  ...
} LL_Buffer_InfoTypeDef;

Connecting I/O buffers to the network

By default, the input and output buffers are allocated by the NPU compiler in the provided memory pools.
As shown in the compiler description and its interaction with STEdgeAI, it is possible to force the compiler not to generate such buffers, and let the user allocate them. --no-inputs-allocation (resp. --no-outputs-allocation) let the compiler not allocate input buffers (resp. output buffers).

When this behavior is used, the generated network.c contains functions to be used to “connect” the user-allocated buffers to the network.
The use of those functions is as follows (for a case where only --no-inputs-allocation has been used for example)

The following functions are of use when handling user-allocated input/output buffers:

// Inputs
LL_ATON_User_IO_Result_t LL_ATON_Set_User_Input_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
void *LL_ATON_Get_User_Input_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);
// Outputs
LL_ATON_Set_User_Output_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num, void *buffer, uint32_t size);
void *LL_ATON_Get_User_Output_Buffer(const NN_Instance_TypeDef *nn_instance, uint32_t num);

The LL_ATON_Set_User_xxx_Buffer functions are used to connect the user-allocated buffer (3rd argument) to the num-th (2nd argument) input/output of the network instance given as the first argument. The size argument is mandatory and should match exactly the expected size (in bytes) of the given tensor.

The LL_ATON_Get_User_xxx_Buffer are used the other way around, to retrieve the address of the num-th (2nd argument) input/output.

For example, to connect an input buffer to a network (named Default below), the following snippet can be used:

LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE(Default) 
// Define buffers for Inputs (the input is an RGB image, 256 x 256 x 3 x 8 bits)
uint8_t input_buff[256 * 256 * 3];

// Connect buffer to the network first input.
res = LL_ATON_Set_User_Input_Buffer(&NN_Instance_Default, 0, input_buff, 256 * 256 * 3);
assert (res == LL_ATON_User_IO_NOERROR);

The user is responsible for allocating correctly the buffers: there is no way for those functions to know if the buffers are correctly allocated, except by trusting the user for the arguments provided.
The types and the sizes of the buffers must match what the model expects.

Using a Real-Time Operating System (RTOS)

ll_aton comes in RTOS-flavored versions that can be activated through LL_ATON_OSAL c-define.
When using an RTOS, the ll_aton is mainly responsible for protecting the ST Neural-ART NPU from concurrent accesses that may cause issues (for example, if two threads are running an inference). This is done by locking and releasing shared resources using features of the OS (mutexes).

The two main supported RTOS are:

ThreadX / Azure RTOS / Eclipse ThreadX - LL_ATON_OSAL_THREADX
FreeRTOS™ - LL_ATON_OSAL_FREERTOS

In all cases, the ll_aton:

Adds OS-specific initialization routines (automatically called by LL_ATON_RT_RuntimeInit)
Redefines Wait For Event/Event generation using semaphores
Adds specific toolings to handle multiple threads running inferences in parallel
Tinker correct code to make interrupt work correctly with the OS

Minimal execution unit

When using an operating system, threads and epochs are scheduled according to the priorities and to the timing: the runtime ensures that the following minimal execution units are not interrupted:

epochs
When using epoch-controller: blobs. When using the epoch controller one or more epochs are packed into a “blob”, a set of instruction scheduled by the epoch controller. If all epochs of a network have been packed into a single blob the network will be executed in one shot.

ThreadX

To enable the use of ThreadX:

Add ThreadX to the project
Add ll_aton_osal_threadx.c to the project
Define LL_ATON_OSAL=LL_ATON_OSAL_THREADX
Define TX_HAS_PARALLEL_NETWORKS symbol:
- TX_HAS_PARALLEL_NETWORKS=0 if only one thread is running an inference on the NPU
- TX_HAS_PARALLEL_NETWORKS=1 if multiple threads are running an inference
Ensure LL_ATON_RT_MODE=LL_ATON_RT_ASYNC is set
Use the LL_ATON_OSAL_WFE macro in place of the _WFE in the network execution loop
Use networks with separated memory spaces

FreeRTOS™

To enable the use of FreeRTOS:

Add FreeRTOS™ suite to the project
Add ll_aton_osal_freertos.c to the project
Define LL_ATON_OSAL=LL_ATON_OSAL_FREERTOS
Define TX_HAS_PARALLEL_NETWORKS symbol:
- FREERTOS_HAS_PARALLEL_NETWORKS=0 if only one thread is running an inference on the NPU
- FREERTOS_HAS_PARALLEL_NETWORKS=1 if multiple threads are running an inference
Ensure LL_ATON_RT_MODE=LL_ATON_RT_ASYNC is set
Use the LL_ATON_OSAL_WFE macro in place of the _WFE in the network execution loop
Use networks with separated memory spaces

Global tips

It is possible to implement one’s own port for a given OS, by using LL_ATON_OSAL=LL_ATON_OSAL_USER_IMPL and creating a file ll_aton_osal_user_impl.h containing the custom implementations needed by ll_aton. The macros to be defined are listed in ll_aton_osal.h.
Careful when doing multiple inferences in parallel to ensure that activation-buffers from an ongoing network inference do not interfere with other activation-buffers from another ongoing network inference

Example

Below is an example for FreeRTOS™, running two networks in two threads.

/* Define mutex and threads attributes */

osMutexId_t mutex_id ;

const osMutexAttr_t mutex_attr = {
    "myMutex",  // name of the mutex
    0,          // attr_bits
    NULL,       // memory for control block
    0           // size for control block
} ;

osThreadId_t Thread1Handle;
const osThreadAttr_t Thread1_attributes = {
  .name = "A",
  .priority = (osPriority_t) osPriorityNormal,
  .stack_size = 256 * 8
};

osThreadId_t Thread2Handle;
const osThreadAttr_t Thread2_attributes = {
  .name = "B",
  .priority = (osPriority_t) osPriorityNormal,
  .stack_size = 256 * 8
};

// Main thread

void mainthread_func(void* arg)
{
  mutex_id = osMutexNew ( &mutex_attr ) ;

  if ( mutex_id == NULL ) {
       while ( 1 ) ;
  }
  
  Thread1Handle = osThreadNew ( thread1_func , NULL , &Thread1_attributes ) ;
  Thread2Handle = osThreadNew ( thread2_func , NULL , &Thread2_attributes ) ;
  
  while ( 1 ) ;
}

// thread 1: intialize and handle atonn_mnist network

void thread1_func(void* arg)
{
  int   testok ;
  void* output_buffer ;
  void* input_buffer ;
  int   output_buffer_len ;
  int   input_buffer_len ;
  LL_ATON_RT_RetValues_t ll_aton_rt_ret ;

  LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE ( atonn_mnist ) ; // Network instance

  /* Get Input/Output buffers */

  input_buffer      = (void*)(inbufs[0].addr_base.i + inbufs[0].offset_start);
  input_buffer_len  = inbufs[0].offset_end - outbufs[0].offset_start;
  output_buffer     = (void*)(outbufs[0].addr_base.i + outbufs[0].offset_start);
  output_buffer_len = outbufs[0].offset_end - outbufs[0].offset_start;

  LL_ATON_RT_Init_Network ( &NN_Instance_atonn_mnist ) ;

  testok = 1 ;

  while ( testok ) {

          get_atonn_mnist_features ( input_buffer , input_buffer_len ) ;

          LL_ATON_RT_Reset_Network ( &NN_Instance_atonn_mnist ) ;

          do {
               ll_aton_rt_ret = LL_ATON_RT_RunEpochBlock ( &NN_Instance_atonn_mnist ) ;

               if ( ll_aton_rt_ret  == LL_ATON_RT_WFE ) {
                    LL_ATON_OSAL_WFE();
               }
          } while ( ll_aton_rt_ret != LL_ATON_RT_DONE ) ;

          post_process_atonn_mnist_inference ( output_buffer , output_buffer_len ) ;
  }
}

void thread2_func(void* arg)
{
  int   testok ;
  void* output_buffer ;
  void* input_buffer ;
  int   output_buffer_len ;
  int   input_buffer_len ;
  LL_ATON_RT_RetValues_t ll_aton_rt_ret ;

  LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE ( atonn_mnist5x5 ) ; // Network instance

  /* Get Input/Output buffers */
  const LL_Buffer_InfoTypeDef *inbufs  = LL_ATON_Input_Buffers_Info_atonn_mnist5x5();
  const LL_Buffer_InfoTypeDef *outbufs = LL_ATON_Output_Buffers_Info_atonn_mnist5x5();

  input_buffer      = (void*)(inbufs[0].addr_base.i + inbufs[0].offset_start);
  input_buffer_len  = inbufs[0].offset_end - outbufs[0].offset_start;
  output_buffer     = (void*)(outbufs[0].addr_base.i + outbufs[0].offset_start);
  output_buffer_len = outbufs[0].offset_end - outbufs[0].offset_start;

  LL_ATON_RT_Init_Network ( &NN_Instance_atonn_mnist5x5 ) ;

  testok = 1 ;

  while ( testok ) {

          get_atonn_mnist5x5_features ( input_buffer , input_buffer_len ) ;

          LL_ATON_RT_Reset_Network ( &NN_Instance_atonn_mnist5x5 ) ;

          do {
               ll_aton_rt_ret = LL_ATON_RT_RunEpochBlock ( &NN_Instance_atonn_mnist5x5 ) ;

               if ( ll_aton_rt_ret  == LL_ATON_RT_WFE ) {
                    LL_ATON_OSAL_WFE();
               }
          } while ( ll_aton_rt_ret != LL_ATON_RT_DONE ) ;

          post_process_atonn_mnist5x5_inference ( output_buffer , output_buffer_len ) ;
  }
}

All reference to not-reentrant function or to shared resources need to be protected with mutex:

    osMutexAcquire ( mutex_id , osWaitForever ) ;
    ....
    osMutexRelease ( mutex_id ) ;

Full simple code example for running an inference, using the “ll_aton” runtime API

#include "ll_aton_runtime.h"
#include "ll_aton_caches_interface.h"

// Network instance declaration

LL_ATON_DECLARE_NAMED_NN_INSTANCE_AND_INTERFACE ( Default ) ;

NN_Interface_TypeDef *NN_InterfacePnt ;
NN_Instance_TypeDef  *NN_InstancePnt  ;

void* input_start_address ;
void* input_end_address ;
void* output_start_address ;
void* output_end_address ;

void main ( void )
{
  applicationSetup() ; // system initialization

  networkSetup() ;

  // Enter inference loop
  
  while ( 1 ) {

          getFeature ( input_start_address ) ; // Wait for input data from somewhere
          
          featurePreProcess ( input_start_address ) ; // Feature pre-processing (if any)

          inferenceExecute ( NN_InstancePnt ) ;

          inferencePostProcess ( output_start_address ) ; // Inference post-processing (if any)

  } /* endwhile */
}

void networkSetup ( void )
{
  LL_ATON_RT_RuntimeInit() ; // Initialize runtime

  // Set references to the network "Default"

  NN_InterfacePnt = (NN_Interface_TypeDef *) &NN_Interface_Default ;
  NN_InstancePnt  = (NN_Instance_TypeDef  *) &NN_Instance_Default ;

  // Initialize network

  LL_ATON_RT_Init_Network ( NN_InstancePnt ) ;   

  // Set call back (if any)

  //LL_ATON_RT_SetEpochCallback ( EpochBlock_TraceCallBack , NN_InstancePnt ) ; // deprecated in future releases
  LL_ATON_RT_SetNetworkCallback ( NN_InstancePnt, EpochBlock_TraceCallBack) ;

  // Get I/O buffers addresses
  
  const LL_Buffer_InfoTypeDef * buffersInfos = NN_InterfacePnt -> input_buffers_info();

  input_start_address = LL_Buffer_addr_start(&buffersInfos[0]);
  input_end_address   = LL_Buffer_addr_end(&buffersInfos[0]);

  buffersInfos = NN_InterfacePnt -> output_buffers_info();

  output_start_address = LL_Buffer_addr_start(&buffersInfos[0]);
  output_end_address   = LL_Buffer_addr_end(&buffersInfos[0]);
}

int inferenceExecute ( NN_Instance_TypeDef *networkInstancePnt )
{
  LL_ATON_RT_RetValues_t ll_aton_rt_ret;

  if ( networkInstancePnt == NULL ) {
       return ( -1 ) ;
  } /* endif */

  LL_ATON_Cache_NPU_Invalidate();  /* if NPU cache is used */
  LL_ATON_Cache_MCU_Clean_Invalidate_Range(input_start_address, input_end_address - input_start_address);
  LL_ATON_Cache_MCU_Invalidate_Range(output_start_address, output_end_address - output_start_address);

  LL_ATON_RT_Reset_Network(networkInstancePnt); // Reset the network instance object

  do
  {
    /* Execute first/next step of Cube.AI/ATON runtime */

    ll_aton_rt_ret = LL_ATON_RT_RunEpochBlock(networkInstancePnt);

    /* Wait for next event */

    if (ll_aton_rt_ret == LL_ATON_RT_WFE)
    { /*** subject to change to fit also user code requirements ***/
      LL_ATON_OSAL_WFE();
    }
  } while (ll_aton_rt_ret != LL_ATON_RT_DONE);

}

Encryption using the API

Details about how to achieve / use encryption can be found in the encryption article.

Embedded ST Neural ART API - r1.1
ST Edge AI Core Technology 2.2.0

ST logo Information in this document is provided solely in connection with ST products. The contents of this document are subject to change without prior notice. © Copyright STMicroelectronics 2025. All rights reserved. www.st.com