2.2.0
Embedded Inference Client ST Edge AI API (st-ai)


ST Edge AI Core

Embedded Inference Client ST Edge AI API (st-ai)


ST Edge AI Core Technology 2.2.0



r1.2

Introduction

This article describes the ST Edge AI embedded inference client API which must be used by a C-application layer (AI client) to use a deployed C-model. This API is available for ISPU, STM32, and STELLAR targets. All model-specific definitions and implementations can be found in the generated C-files: <name>.c and <name>.h files

Integration model/view and dependencies

Figure above shows that the integration of the ST Edge AI stack in an application is simple and straightforward since there are few or standard SW/HW dependencies with the run-time. ST Edge AI client uses the generated model through a set of well-defined stai_<name>_XXX() functions (also called “Embedded inference client ST Edge AI APIs”). The ST Edge AI software stack provides a compiled library (i.e. network runtime library) per ST device and supported tool-chain.

Getting started - Minimal application

The following code snippet provides a typical and minimal example using the API for a 32b floating-point model. The pre-trained model is generated with the --no-inputs-allocation and --no-outputs-allocation options (i.e. input and output buffers are not allocated in the “activations” buffer and default network c-name is used). Note that all AI requested client resources (activations buffer and data buffers for the IO) are allocated at compile time thanks the generated macros: STAI_NETWORK_XXX_SIZE allowing a minimalist, easier and quick integration.

#include <stdio.h>

#include "network.h"

/* Global byte buffer to save instantiated C-model network context */
STAI_ALIGNED(STAI_NETWORK_CONTEXT_ALIGNMENT)
static stai_network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};

/* Global c-array to handle the activations buffer */
STAI_ALIGNED(STAI_NETWORK_ACTIVATION_1_ALIGNMENT)
static uint8_t activations[STAI_NETWORK_ACTIVATION_1_SIZE_BYTES];

/* Array to store the data of the input tensor */
STAI_ALIGNED(STAI_NETWORK_IN_1_ALIGNMENT)
static float in_data[STAI_NETWORK_IN_1_SIZE];
/* or static uint8_t in_data[STAI_NETWORK_IN_1_SIZE_BYTES]; */

/* c-array to store the data of the output tensor */
STAI_ALIGNED(STAI_NETWORK_OUT_1_ALIGNMENT)
static float out_data[STAI_NETWORK_OUT_1_SIZE];
/* static uint8_t out_data[STAI_NETWORK_OUT_1_SIZE_BYTES]; */

/* Array of pointer to manage the model's input/output tensors */
static stai_ptr stai_input[STAI_NETWORK_IN_NUM];
static stai_ptr stai_output[STAI_NETWORK_OUT_NUM];

/* 
 * Bootstrap
 */
int aiInit(void) {
  stai_return_code ret_code;
  
  /* Initialize runtime library */
  ret_code = stai_runtime_init();
  if (ret_code != STAI_SUCCESS) { ... };

  /* Initialize network model context */
  ret_code = stai_network_init(network_context);
  if (ret_code != STAI_SUCCESS) { ... };

  /* Set network activations buffers */
  const stai_ptr acts[] = { activations };
  ret_code = stai_network_set_activations(network_context, acts, STAI_NETWORK_ACTIVATIONS_NUM);
  if (ret_code != STAI_SUCCESS) { ... };

  return 0;
}

int aiDeinit(void) {
  stai_return_code ret_code;

  /* Deinitialize network model context */
  ret_code = stai_network_deinit(network_context);
  if (ret_code != STAI_SUCCESS) { ... };

  /* Deinitialize runtime library */
  ret_code = stai_runtime_deinit();
  if (ret_code != STAI_SUCCESS) { ... };

  return 0;
}

/* 
 * Run inference
 */
int aiRun(const void *in_data, void *out_data) {
  stai_return_code ret_code;

  /* Set network input/output buffers */
  const stai_ptr inputs_ptr[] = { in_data };
  ret_code = stai_network_set_inputs(network_context, inputs_ptr, STAI_NETWORK_IN_NUM);
  if (ret_code != STAI_SUCCESS) { ... };

  const stai_ptr outputs_ptr[] = { out_data };
  ret_code = stai_network_set_outputs(network_context, outputs_ptr, STAI_NETWORK_OUT_NUM);
  if (ret_code != STAI_SUCCESS) { ... };


  /* 2 - Perform the inference */
  ret_code = stai_network_run(network, STAI_MODE_SYNC);
  if (ret_code != STAI_SUCCESS) {
      ret_code = stai_network_get_error(network_context);
      ...
  };
  
  return 0;
}

/* 
 * Example of main loop function
 */
void main_loop() {
  aiInit();

  while (1) {
    /* 1 - Acquire, pre-process and fill the input buffers */
    acquire_and_process_data(in_data);

    /* 2 - Call inference engine */
    aiRun(in_data, out_data);

    /* 3 - Post-process the predictions */
    post_process(out_data);
  }

  aiDeinit();
}

AI buffers and privileged placement

From the application/integration point of view, only four memory-related objects are considered as dimensioning for the system. They are fixed size since there is no support for the dynamic tensors i.e., all the sizes and shapes of the tensors are defined/fixed at generation time. In this way inference C run-time engine does not require to use the system heap.

  • “activations” buffers consist of a simple contiguous memory-mapped buffer (or multiple memory-mapped buffers if a json memory description file is passed to the CLI, see Multiple heap support section). They are placed into a read-write memory segment and are owned and allocated by the AI client application (unless the --allocate-activations option is used, in this case the generated code will instantiate them). Their addresses are passed to the network instance (see stai_<name>_set_activations() function) and they are used as a set of temporary private heaps (or working buffers) during the execution of the inference to store the intermediate results. Between two inferences, the associated memory segments can be reused by the application. Their sizes (STAI_<NAME>_ACTIVATIONS_SIZES) are defined during the code generation and their sum corresponds to the reported RAM metric. At the same time a set of macros STAI_<NAME>_ACTIVATION_X_SIZE and STAI_<NAME>_ACTIVATION_X_SIZE_BYTES are generated for each buffer. The number of activations buffers is exposed to client application by STAI_<NAME>_ACTIVATIONS_NUM and their total size is reported in STAI_<NAME>_ACTIVATIONS_SIZE
  • “states” buffers consist of a simple contiguous memory-mapped buffer (or multiple memory-mapped buffers if a json memory description file is passed to the CLI, see Multiple heap support section). They are placed into a read-write memory segment and are owned and allocated by the AI client application (unless the --allocate-states option is used, in this case the generated code will instantiate them). Their addresses are passed to the network instance (see stai_<name>_set_states() function) and they are used as a set of persistent private heaps (or working buffers) during the execution of the inference to store the internal state of a stateful network. Between two inferences, the associated memory segments can not be reused by the application. Their sizes (STAI_<NAME>_STATES_SIZES) are defined during the code generation. At the same time a set of macros STAI_<NAME>_STATE_X_SIZE and STAI_<NAME>_STATE_X_SIZE_BYTES are generated for each buffer. States buffers are generated only if a model contains at least one of the supported stateful operators and the stateful property is set to true. The number of states buffers is exposed to client application by STAI_<NAME>_STATES_NUM and their total size is reported in STAI_<NAME>_STATES_SIZE
  • “weights” buffers are a simple contiguous memory-mapped buffer (or multiple memory-mapped buffers with the --split-weights option). They are generally placed into a non-volatile and read-only memory device. Their addresses may be optionally passed to the network instance (see stai_<name>_set_weights() function) and they are used as a set read-only memory buffers during the execution of the inference Their sizes (STAI_<NAME>_WEIGHTS_SIZES) are defined during the code generation and their sum corresponds to the reported ROM metric. At the same time a set of macros STAI_<NAME>_WEIGHT_X_SIZE and STAI_<NAME>_WEIGHT_X_SIZE_BYTES are generated for each buffer. The number of weights buffers is exposed to client application by STAI_<NAME>_WEIGHTS_NUM and their total size is reported in STAI_<NAME>_WEIGHTS_SIZE
  • “output” and “input” buffers must be also placed in the read-write memory-mapped buffers. By default, they are owned and provided by the AI client. Their sizes are model dependent and known at generation time (STAI_<NAME>_IN/OUT_SIZE_BYTES). but they can be also located in the “activations” buffer. Their addresses may be optionally passed to the network instance (see stai_<name>_set_[inputs|outputs]() functions) and they are used as a set memory buffers used to store network input/output data used during the execution of the inference
Default data memory layout

Note

The placement of the buffers is application linker or/and runtime dependent. Additional ROM and RAM for the network runtime library itself and for network c-files (txt/rodata/bss and data sections) can be also considered but they are generally not significant to dimension the system in comparison of the requested size for the “weighs” and “activations” buffers.

Following table details the privileged placement choices adopted when targeting STM32 devices to minimize the inference time. Usually the most constrained memory object is the “activations” buffer.

memory object type preferably placed in
client stack a low latency & high bandwidth device. STM32 embedded SRAM or data-TCM when available (zero wait-state memory).
activations, inputs/outputs a low/medium latency & high bandwidth device. STM32 embedded SRAM when available or external RAM. The trade-off is mainly driven by the size and if the STM32 MCU has a data cache (Cortex-M7 family). If input buffers are not allocated in the “activations” buffer, the “activations” buffer should be privileged.
weights a medium latency & medium bandwidth device. STM32 embedded FLASH memory or external FLASH. The trade-off is driven by the STM32 MCU data cache availability (Cortex-M7 family), the weights can be split between different memory devices.

I/O buffers inside the “activations” buffer

By default, the input and output buffers are allocated in the “activations” buffer. During generation, the minimal size of the “activations” buffer is adjusted accordingly. Please note that the base addresses of the respective memory sub-regions depend on the model. These addresses are not necessarily aligned with the base address of the “activations” buffer and are pre-defined or pre-computed at generation time. For more details, refer to the snippet code. Inside the “activations” buffer, the reserved memory regions are 4-bytes aligned (or 8-bytes) according the selected target. For the specific needs, the user has the possibility to define the requested memory-alignment for the input buffers (respectively for the output buffers) with the --input-memory-alignment INT option (respectively --output-memory-alignment).

Data memory layout
  • The “external” input/output buffers (i.e., allocated outside the activations buffer) can always be used since input and output buffer addresses can be overwritten with the stai_<name>_set_inputs and `stai_<name>_set_outputs functions, respectively.
  • By default, the code generator reserves only one place per input tensor in the activations buffer (and similarly for output buffers). If a double buffer scheme should be implemented, it is recommended to use the --no-[inputs|outputs]-allocation options and manage the IO buffers through the application.

Multiple heap support

To optimize usage of the RAM for performance reasons or because the available memory is fragmented between different memory pools (embedded in the device or externally), the activations buffer can be allocated in different memory segments.

Thanks to the --memory-pool option, the user has the possibility to provide a description of the device available memories (i.e. memory pools) which can be used to place the activation/scratch/state tensor buffers. During the code generation, the allocator will privilege the memory pools according to the memory characteristics (i.e. latency and throughput) and with pools having same characteristics by the order listed in the JSON file. If a selected memory pool can be not used (insufficient size), the next will be used if available else an error is generated. The allocator try to place each specific buffer according the the characteristics of the buffer to place and the characteristics of the memory pools available and their residual size. The preferred location for critical buffers (e.g. the scratch buffers) are memory pools with high throughput and low latency. Pools with low throughput and high latency (e.g. external RAM) are used only if no space is available in other more performant memory pools. The final placement of the buffers depends also on the optimization objective option defined in CLI (i.e. balanced is the default):

  • time optimization hints the allocator places buffers in pools with higher performances no matter of the final RAM size used.
  • ram optimization hints the allocator try to minimize the total amount of RAM used by placing the buffers potentially in less performant memory pools.
  • balanced option is a trade-off where the allocator try to put buffers in the best-fitting memory pool yet minimizing the total amount of RAM used.

Following figure illustrates the case, where the activations buffer is split in 3. The first part is placed in the low latency/high throughput memory (like DTCM for STM32H7/F7 series), the second is placed in a “normal” internal memory and the last in an external memory. The JSON file is requested to indicate the maximum size of each memory segment ("usable_size" key) which can be used by the AI stack allowing to reserve a part of the critical memory resources for other SW objects.

Data memory layout with multiple heaps

Following snippet code illustrates the initialization sequence. The 'activationsX' objects will be placed in different memory pools thanks to specific linker directives by the end-user.

...
STAI_ALIGNED(STAI_NETWORK_ACTIVATION_1_ALIGNMENT)
static uint8_t activations1[STAI_NETWORK_ACTIVATION_1_SIZE_BYTES];

STAI_ALIGNED(STAI_NETWORK_ACTIVATION_2_ALIGNMENT)
static uint8_t activations2[STAI_NETWORK_ACTIVATION_2_SIZE_BYTES];

STAI_ALIGNED(STAI_NETWORK_ACTIVATION_3_ALIGNMENT)
static uint8_t activations3[STAI_NETWORK_ACTIVATION_3_SIZE_BYTES];
...
int aiInit(void) {
  stai_return_code ret_code;
  
  /* Create and initialize the c-model */
  const stai_ptr acts[] = { activations1, activations2, activations3 };

  /* Initialize runtime library */
  ret_code = stai_runtime_init();
  if (ret_code != STAI_SUCCESS) { ... }

  /* Initialize network model context */
  ret_code = stai_network_init(network_context);
  if (ret_code != STAI_SUCCESS) { ... }

  /* Set network activations buffers */
  ret_code = stai_network_set_activations(network_context, acts, STAI_NETWORK_ACTIVATIONS_NUM);
  if (ret_code != STAI_SUCCESS) { ... }
  ...
}

Split weights buffer

The --split-weights option allows to place statically tensor-by-tensor the weights in different memory segments (on or off-chip) thanks to specific linker directives for the end-user application.

  • it relaxes the placing constraint of a large buffer into a constrained and non-homogeneous memory sub-system.
  • after profiling, it allows to improve the global inference time, by placing the critical weights into a low latency memory. Or on the contrary it can free the critical resource (i.e. internal flash) which can be used by the application.
Example of split weights buffer (static placement)

The --split-weights option prevents the generation of a unique c-array for the whole data of the weights/bias tensors (<name>_data.c file). Without the options weights are declared as:

STAI_ALIGNED(8)
const uint8_t s_network_weights[ 794136 ] = {
    0xcf, 0xae, 0x9d, 0x3d, 0x1b, 0x0c, 0xd1, 0xbd, 0x63, 0x99,
    0x36, 0xbd, 0xdb, 0x67, 0x46, 0xbe, 0x3b, 0xe7, 0x0d, 0x3e,
    ...
    0x41, 0xbf, 0xc6, 0x7d, 0x69, 0x3e, 0x18, 0x87, 0x37,
    0xbe, 0x83, 0x63, 0x0f, 0x3f, 0x51, 0xa1, 0xdd, 0xbe
  };

that is declared in <name>_data.h header as following:

STAI_ALIGNED(8)
extern const uint8_t s_network_weights[ 794136 ];

On the contrary with --split-weights a s_<network>_<layer_name>_[bias|weights|*]_array_weights[]) c-array is created to store the data of each weight/bias tensor. A global map table is also built that is used by the run-time to retrieve the addresses of the different c-arrays.

...
/* conv2d_1_weights_array - FLOAT|CONST */
STAI_ALIGNED(8)
const uint8_t s_network_conv2d_1_weights_array_weights[ 2048 ] = {
  0xcf, 0xae, 0x9d, 0x3d, 0x1b, 0x0c, 0xd1, 0xbd, 0x63, 0x99,
...
}
...
/* dense_3_bias_array - FLOAT|CONST */
STAI_ALIGNED(8)
const uint8_t s_network_dense_3_bias_array_weights[ 24 ] = {
  0xa2, 0x72, 0x82, 0x3e, 0x5a, 0x88, 0x41, 0xbf, 0xc6, 0x7d,
  0x69, 0x3e, 0x18, 0x87, 0x37, 0xbe, 0x83, 0x63, 0x0f, 0x3f,
  0x51, 0xa1, 0xdd, 0xbe
};
  • without particular linker directives, these multiple c-arrays are always placed in a .rodata section as for the unique c-array.
  • client API is not changed. the stai_<name>_get_weights() function is used retrieve addresses of the weights buffers. stai_<name>_set_weights() function is used to set weights addresses
  • as illustrated in the previous figure, const C-attribute can be manually commented to use the default C-startup behavior to copy the data in an initialized RAM data section.

Re-entrance and thread safety considerations

No internal synchronization mechanism is implemented to protect the entry points against concurrent accesses. If the API is used in a multi-threaded context, the protection of the instantiated NN(s) must be guaranteed by the application layer itself. To minimize the usage of the RAM, the same activation memory chunk (SizeSHARED) can be used to support multiple networks. In this case, the user must guarantee that an on-going inference execution cannot be preempted by the execution of another network.

SizeSHARED = MAX(STAI_<name>_ACTIVATIONS_SIZE_BYTES) for name = “NET1” … “NET2”

Tip

If the preemption is expected for real-time constraint or latency reasons, each network instance must have its own and private activations buffer.

Debug support

The network runtime library must be considered as an optimized black box object in binary format (sources files are not delivered). There is no run-time services allowing to dump internal states. Mapping and port of the model is guaranteed by the ST Edge AI code generator. Some integration issues can be highlighted by the stai_<name>_get_error() function.

Versioning

In <network>.h generated file there is a set of macros imported from stai.h header that allows to know the version of the tool used to generated the specialized NN C-files and the versions of the associated run-time API.

Warning

Backward or/and forward compatibility between generated code and run-time library is not fully guaranteed. If a new version of the tool is used to generate the new specialized NN c-files, it is highly recommended to update also the associated header files and network run-time library.

/* stai.h file */

#define STAI_TOOLS_VERSION_MAJOR 1
#define STAI_TOOLS_VERSION_MINOR 0
#define STAI_TOOLS_VERSION_MICRO 0

#define STAI_API_VERSION_MAJOR 1
#define STAI_API_VERSION_MINOR 0
#define STAI_API_VERSION_MICRO 0
type description
STAI_TOOLS_VERSION_XX global version of the tool package
STAI_API_VERSION_XX version of the API which is used by the generated NN c-files to call the network runtime library.

ST Edge AI STAI C APIs return codes

Each ST Edge AI C API returns an exit code telling the status after calling the API. If the APIs succeed the return code stai_return_code is STAI_SUCCESS. In case of an internal error (e.g., because of mismatches in arguments provided to the APIs) different return codes can be returned. The C network context keeps track of the first error thus the following invoked APIs and also the stai_<name>_get_error(stai_network*) will always return the first generated error. The stai_return_code can be used by client application to manage and report errors.

C-enum description
STAI_SUCCESS No errors triggered. API succeed
STAI_RUNNING_NO_WFE Currently not supported
STAI_RUNNING_WFE Currently not supported
STAI_DONE The API completed its tasks
STAI_ERROR_GENERIC Generic Error code
STAI_ERROR_NETWORK_INVALID_API_ARGUMENTS (at least one) invalid argiment has been provided to the API
STAI_ERROR_NETWORK_INVALID_CONTEXT_HANDLE The provided context pointer is not valid (e.g. it is NULL or it is corrupted)
STAI_ERROR_NETWORK_INVALID_CONTEXT_SIZE The provided context has different (byte) size than expected
STAI_ERROR_NETWORK_INVALID_CONTEXT_ALIGNMENT The provided context pointer has an invalid alignment
STAI_ERROR_NETWORK_INVALID_INFO The network C info are corrupted
STAI_ERROR_NETWORK_INVALID_RUN The stai<name>_run() API failed
STAI_ERROR_NETWORK_INVALID_RUNTIME The runtime initialization failed
STAI_ERROR_NETWORK_INVALID_ACTIVATIONS_PTR (at least one of) the activation buffers pointers is invalid
STAI_ERROR_NETWORK_INVALID_ACTIVATIONS_NUM Wrong number of activation buffers provided
STAI_ERROR_NETWORK_INVALID_IN_PTR (at least one of) the input buffers pointers is invalid
STAI_ERROR_NETWORK_INVALID_IN_NUM Wrong number of input buffers provided
STAI_ERROR_NETWORK_INVALID_OUT_PTR (at least one of) the output buffers pointers is invalid
STAI_ERROR_NETWORK_INVALID_OUT_NUM Wrong number of output buffers provided
STAI_ERROR_NETWORK_INVALID_STATES_PTR (at least one of) the state buffers pointers is invalid
STAI_ERROR_NETWORK_INVALID_STATES_NUM Wrong number of state buffers provided
STAI_ERROR_NETWORK_INVALID_WEIGHTS_PTR (at least one of) the weights buffers pointers is invalid
STAI_ERROR_NETWORK_INVALID_WEIGHTS_NUM Wrong number of weights buffers provided
STAI_ERROR_NETWORK_INVALID_CALLBACK Invalid callback pointer set
STAI_ERROR_NOT_IMPLEMENTED API is not implemented (i.e., for the specific target)
STAI_ERROR_INVALID_BUFFER_ALIGNMENT A buffer pointer has an invalid expected alignment
STAI_ERROR_NOT_CURRENT_NETWORK Wrong context handle provided (e.g., orchestrating Multiple Networks)
STAI_ERROR_NETWORK_STILL_RUNNING The last inference has not yet been completed
STAI_ERROR_STAI_INIT_FAILED Failed initialization of an init API
STAI_ERROR_STAI_DEINIT_FAILED Failed de-initialization of a deinit API

ST Edge AI STAI_<NAME>_XXX C-defines

Different C-defines are generated in the <name>.h header file. They can be used by the application code to allocate at compile time or dynamically the requested buffers, or for debug purpose. At run-time, stai_<network>_get_info() can also be used to retrieve the requested sizes.

C-defines description
STAI_<NAME>_MODEL_NAME C-string with the C-name of the model
STAI_<NAME>_MODEL_SIGNATURE C-Model generated checksum as a hex number
STAI_<NAME>_ORIGIN_MODEL_NAME C-string with the original name of the model
STAI_<NAME>_ORIGIN_MODEL_SIGNATURE C-string with the checksum of the original model
STAI_<NAME>_MACC_NUM C-Model estimated complexity (as a number of MAC operations)
STAI_<NAME>_NODES_NUM C-Model number of operational nodes generated
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_NUM total number of input/output/activations/weights/states buffers
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZE total number of elements of all the in/out/activations/weights/states buffers
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZE_BYTES total size in bytes of all the in/out/activations/weights/states buffers
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_ALIGNMENTS C-table (integer type) to specify the alignment of in/out/activations/weights/states buffers
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZES C-table (integer type) to specify the number of item by in/out/activations/weights/states buffers (see “IO tensor description” section)
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZES_BYTES C-table (integer type) to specify the size in bytes by in/out/activation/weight/state buffer
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_SIZE total number of item for the x-th in/out/activation/weight/state buffer
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_SIZE_BYTES size in bytes for the x-th in/out/activation/weight/state buffer
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_ALIGNMENT expected C-array alignment for the x-th in/out/activation/weight/state buffer
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_FLAGS flags for the x-th in/out/activation/weight/state buffer
STAI_<NAME>_IN/OUT_x_NAME C-string (optional) storing the name for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_FORMAT C-format using stai enums for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_SHAPE expected C-shape dimension values for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_RANK expected cardinality for C-shape of the the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_BATCH (optional) batch dimension size for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_CHANNEL (optional) channel dimension size for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_WIDTH (optional) width dimension size for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_HEIGHT (optional) height dimension size for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_SCALE (optional) scale quantization factor for the x-th in/out buffer
STAI_<NAME>_IN/OUT_x_ZERO_POINT (optional) zeropoint quantization factor for the x-th in/out buffer

ST Edge AI client APIs

This section presents the list of C APIs for interfacing generated C-network models with a client. It is declared in stai.h header file (model independent APIs) and <name>.h header file (model dependent APIs).

stai_runtime_init()

stai_return_code stai_runtime_init(void);

This mandatory function is used by the application to initialize the ST Edge AI C run-time. It must be called just once before using the AI Platform.

#include "network.h"

...

stai_return_code ret_code = stai_runtime_init();
if (ret_code != STAI_SUCCESS) {
  ...
}

stai_runtime_deinit()

stai_return_code stai_runtime_deinit(void);

This mandatory function is used by the application to de-initialize the ST Edge AI C run-time. It must be called just once when the AI Platform is no more needed.

#include "network.h"

...

stai_return_code ret_code = stai_runtime_deinit();
if (ret_code != STAI_SUCCESS) {
  ...
}

stai_runtime_get_info(stai_runtime_info* info)

stai_return_code stai_runtime_get_info(stai_runtime_info* info);

This function is used by the application to retrieve some info about ST Edge AI C run-time. The info are filled into the stai_runtime_info C struct defined as follow:

typedef struct {
  stai_version            api_version;      /* X.Y.Z version of the ST Edge AI APIs */
  stai_version            runtime_version;  /* X.Y.Z version of the runtime */
  stai_version            tools_version;    /* version of the tool compatible with the run-time */
  uint32_t                runtime_build;    /* 32bit run-time identifier  (i.e. build info) */
  stai_compiler_id        compiler_id;      /* compiler ID enum */
  const char*             compiler_desc;    /* string with a short description of the compiler */
} stai_runtime_info;

Following code snippet shows how to print some of the stai_runtime_info fields:

#include <stdio.h>
#include "network.h"

...

stai_runtime_info info;
stai_return_code ret_code = stai_runtime_get_info(&info);
if (ret_code != STAI_SUCCESS) {
  printf("Runtime info:\n");
  printf("  - api_version    : %d.%d.%d\n",
    info.api_version.major, info.api_version.micro, info.api_version.micro);
  printf("  - runtime_version: %d.%d.%d\n",
    info.runtime_version.major, info.runtime_version.micro, info.runtime_version.micro);
  printf("  - runtime_build  : 0x%08x\n", info.runtime_build);
  printf("  - compiler_id    : 0x%02x\n", info.compiler_id);
  printf("  - compiler_desc  : %s\n\n", info.compiler_desc);
  ...
}
...

stai_<name>_init()

stai_return_code stai_network_init(stai_network* network);

This mandatory function is used by the application to initialize the internal data structures (i.e., context) of a generated network model.

  • NOTE: network pointer should be a valid byte array with size STAI_NETWORK_CONTEXT_SIZE and proper alignment STAI_NETWORK_CONTEXT_ALIGNMENT
#include "network.h"

STAI_ALIGNED(STAI_NETWORK_CONTEXT_ALIGNMENT)
static stai_network context[STAI_NETWORK_CONTEXT_SIZE] = {0};
...

stai_return_code ret_code = stai_network_init(context);

stai_<name>_deinit()

stai_return_code stai_network_deinit(stai_network* network);

This mandatory function is used by the application to de-initialize the internal run-time data structures of a network context

  • network pointer should be a valid byte array with size STAI_NETWORK_CONTEXT_SIZE and proper alignment STAI_NETWORK_CONTEXT_ALIGNMENT
#include "network.h"

...
STAI_ALIGNED(STAI_NETWORK_CONTEXT_ALIGNMENT)
static stai_network context[STAI_NETWORK_CONTEXT_SIZE] = {0};
...

stai_return_code ret_code = stai_network_deinit(context);

stai_<name>_get_info()

stai_return_code stai_<name>_get_info(stai_network* network, stai_network_info* info);

This function allows to retrieve the run-time data attributes of an instantiated model. Refer to stai.h file to show the details of the returned stai_network_info C-structure.

Warning

  • before invoking this call the network context should have been already initialized.

Typical usage

#include "network.h"
...
STAI_ALIGNED(STAI_NETWORK_CONTEXT_ALIGNMENT)
static stai_network context[STAI_NETWORK_CONTEXT_SIZE] = {0};
...

stai_network_info info;
stai_return_code ret_code;
...
  /* 1 - Initialize runtime */
  stai_runtime_init();
  /* 2 - Create and initialize network context */
  stai_network_init(network_context);
...
  ret_code = stai_network_get_info(network_context, &info)
  if (ret_code == STAI_SUCCESS) { /* display the info */ }

Example of possible reported informations

Network information
---------------------------------------------------
 c_model_name          : network
 c_model_signature     : e38d1b6095099638ae20a42e53398dd7
 c_model_datetime      : Mon Nov 30 19:40:02 2023
 c_compile_datetime    : Nov 30 2023 19:40:31
 runtime_version       : 9.1.0
 tool_version          : 1.0.0
 api_version           : 1.0.0
 flags                 : 0x0
 n_macc                : 336088
 n_nodes               : 3
 n_states              : 0
 n_weights             : 1
 n_inputs              : 1
 n_outputs             : 1

stai_<name>_get_[inputs|outputs]()

stai_return_code stai_<name>_get_inputs(stai_network* network, stai_ptr* inputs, stai_size* n_inputs);
stai_return_code stai_<name>_get_outputs(stai_network* network, stai_ptr* outputs, stai_size* n_outputs);

These functions return network inputs/outputs array base addresses as a table of array pointers. Number of inputs (respectively of outputs) is returned through n_inputs (respectively n_outputs).

These functions could be used for:

  • Feeding data into a network: retrieve input pointers, write data to it
  • Retrieveing output data after an inference: retrieve output pointers, get data from it

stai_<name>_get_[activations|weights|states]()

stai_return_code stai_<name>_get_activations(stai_network* network, stai_ptr* activations, stai_size* n_activations);
stai_return_code stai_<name>_get_weights(stai_network* network, stai_ptr* weights, stai_size* n_weights);
stai_return_code stai_<name>_get_states(stai_network* network, stai_ptr* states, stai_size* n_states);

These functions return network activations/weights/states array base addresses as a table of array pointers. n_activations/n_weights/n_states allow to return the number of activation/weights/persistent states buffers.

These functions could be used for e.g. debugging. The stai_<name>_[get|set]_states() can also be used to save/restore a state of a stateful network at any given time.

stai_<name>_set_[inputs|outputs]()

stai_return_code stai_<name>_set_inputs(stai_network* network, stai_ptr* inputs, const stai_size n_inputs);
stai_return_code stai_<name>_set_outputs(stai_network* network, stai_ptr* outputs, const stai_size n_outputs);

These functions set network inputs/outputs array base addresses as a table of array pointers. The table of pointers must be allocated by client before invoking the API. n_inputs is the number of inputs to be set while n_outputs is the number of outputs to be set (i.e. the number of pointers in each table).

Typical use-cases are:

  • If the --no-[inputs|outputs]-allocation CLI option is used, these functions must be called to set the addresses of the IO buffers used during the inference; otherwise, the inference will result in an error.

  • These functions can be also used to overwrite the addresses of the IO buffers, for example to support a double buffering scheme. Note that the original addresses of the buffers allocated in activations buffer (default behavior), initially returned by the stai_<name>_get_[inputs|outputs]() functions are not preserved. When a double buffering scheme is expected, it is recommended to use the --no-[inputs|outputs]-allocation CLI options.

These APIs shall be called before invoking stai_<name>_run.

Tip

STAI_<name>_IN_NUM, resp. STAI_<name>_OUT_NUM, helper macro can be used to know at compile-time the number of network inputs, resp. outputs. These values are also returned by the stai_network_info struct n_inputs and n_outputs fields (see stai_<name>_get_info() function).

stai_<name>_set_[activations|weights|states]()

stai_return_code stai_<name>_set_activations(stai_network* network, stai_ptr* activations, const stai_size n_activations);
stai_return_code stai_<name>_set_weights(stai_network* network, stai_ptr* weights, const stai_size n_weights);
stai_return_code stai_<name>_set_states(stai_network* network, stai_ptr* states, const stai_size n_states);

These functions set network activations/weights/states array base addresses as a table of array pointers. The table of pointers must be allocated by client before invoking the API. n_activations/n_weights/n_states are the number of activation, weight, persistent state buffers.

Typical use-cases are:

  • stai_<name>_set_activations() must be invoked if --allocate-activations is not used to provide to runtime with client allocated activations buffers pointers.

  • stai_<name>_set_states() must be invoked if --allocate-states is not used to provide to runtime with client allocated (persistent) states buffers pointers.

  • stai_<name>_set_weights() must be invoked if --binary was used, to provide to runtime and the weights buffers pointers.

When used, the client application is responsabile for providing this function with enough addresses in the table and with pointers pointing to properly-sized memory chunks (this information is statically defined, in the generated files).

See Multiple heap support for an example of use.

Tip

STAI_<NAME>_ACTIVATIONS/WEIGHTS/STATES_NUM can be used to find the number of elements in the tables to be defined. STAI_<NAME>_ACTIVATIONS/WEIGHTS/STATES_SIZES_BYTES can be used to find the size of each element

stai_<name>_run()

stai_return_code stai_<name>_run(stai_network* network, const stai_run_mode mode);

This function is called to run the neural network inference. The API may be blocking or non blocking depending on the mode parameter. Default mode is STAI_MODE_SYNC that is implemented for all the targets. Non blocking model (i.e., STAI_MODE_ASYNC) is currently not supported.

The returned value is a stai_return_code value: if this value is STAI_SUCCESS the run has been completed correctly, else the first error is reported. See also the stai_<name>_get_error(stai_network*)

Typical usages

Default Use Case is illustrated by the “Getting starting” code snippet. Following code is an example with a C-model which has one input and two output tensors that needs to be allocated by client App. The activation buffer has been already allocated by tool using the CLI option --allocate-activations and is configured as part of the stai_network_init() API tasks

#include <stdio.h>
#include "network.h"
/* Network context allocation */
STAI_ALIGNED(STAI_NETWORK_CONTEXT_ALIGNMENT)
static network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};

...
/* C-table to store the @ of the input buffer */
static float in_1_data[STAI_NETWORK_IN_1_SIZE];
static stai_ptr in_data[STAI_NETWORK_IN_NUM] = {
  &in_1_data[0]
};


/* data buffer for the output buffers */
static float out_1_data[STAI_NETWORK_OUT_1_SIZE];
static float out_2_data[STAI_NETWORK_OUT_2_SIZE];

/* C-table to store the @ of the output buffers */
static stai_ptr out_data[STAI_NETWORK_OUT_NUM] = {
  &out_1_data[0],
  &out_2_data[0]
};

...
int aiInit(void) {
  /* 1 - Initialize runtime */
  stai_runtime_init();
  ...
  /* 2 - Create and initialize network context */
  ...
  stai_network_init(network_context);


  /* 2 - Update the AI input/output buffers */
  stai_network_set_inputs(network_context, in_data, STAI_NETWORK_IN_NUM);
  stai_network_set_outputs(network_context, out_data, STAI_NETWORK_OUT_NUM);
  ...
}

int aiDeinit(void) {
  /* 1 - De-initialize network context */
  stai_network_deinit(network_context);

  /* 2 - Deinitialize runtime */
  stai_runtime_deinit();
  ...
}

void main_loop()
{
  aiInit();

  while (1) {
    /* 1 - Acquire, pre-process and fill the input buffers */
    acquire_and_process_data(in_data);
    
    /* 2 - Call inference engine */
    stai_network_run(network_context, STAI_MODE_SYNC);

    /* 3 - Post-process the predictions */
    post_process(out_data);
  }
  aiDeinit();
}

stai_<name>_get_context_size()

stai_size stai_<name>_get_context_size(void);

This function can be used by the AI client application to get the size in bytes of the network model internal context to allocate or report it.

stai_<name>_get_error()

stai_return_code stai_<name>_get_error(stai_network* network);

This function can be used by the client application to retrieve the first error reported during the execution of a stai_<name>_xxx() function.

  • See stai.h C header to have the list of the returned error code (stai_return_code).

Typical ST Edge AI error function handler (debug/log purpose)

#include "network.h"
...
void aiLogErr(const stai_return_code err)
{
  printf("E: STAI error - code=0x%x\r\n", err);
}

Retrieve network and tensors information

Following code snippets show how to retrieve the network and tensors information from a `stai_network_info’ C-struct .

#include <stdio.h>
#include "network.h"

void dump_tensor_info(const stai_tensor* t) {
  stai_size            size_bytes;
  stai_flags           flags;
  stai_format          format;
  stai_shape           shape;
  stai_array_f32       scale;
  stai_array_s16       zeropoint;
  const char*          name;

  printf("  name: %s\n", t->name);
  printf("     bytes(%u) flags(0x%08x) format(0x%08x)\n", t->size_bytes, t->flags, t->format);
}

{
  /* Initialize the network context */
  ...
  /* Network context should have been already initialized */
  stai_network_info info;
  stai_return_code ret = stai_network_get_info(network_context, &info);
  uint16_t i;

  if (ret == STAI_SUCCESS) {
    printf("Model name: %s\n", info.c_model_name);

    printf("Model inputs : %u\n", info.n_inputs);
    for (i=0; i<info.n_inputs; i++) {
      dump_tensor_info(&info.inputs[i]);
    }

    printf("Model outputs: %u\n", info.n_outputs);
    for (i=0; i<info.n_outputs; i++) {
      dump_tensor_info(&info.outputs[i]);
    }
  }
  ...
}

Base address of the IO buffers

The following code snippet illustrates the minimum instructions required to retrieve the effective address of the buffer from the activations buffer. If the input and/or output buffers are not allocated in the activations buffer, NULL is returned, unless the client application has already set the input / output buffers pointers using the stai_<name>_set[inputs|outputs]() API. Note that the network context should have been already initialized.

#include "network.h"

/* Network context allocation */
STAI_ALIGNED(STAI_NETWORK_CONTEXT_ALIGNMENT)
static network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};

/* Buffer should be allocated by the application */
static uint8_t in_data_1[AI_NETWORK_IN_1_SIZE_BYTES];

...

{
  stai_ptr inputs[AI_NETWORK_IN_NUM];
  stai_size n_inputs = 0;
  stai_network_get_inputs(network_context, inputs, &n_inputs);
...
}

float32 to 8b data type conversion

Following code snippet illustrates the float (float) to integer (int8_t/uint8_t) format conversion. Input buffer is used as destination buffer.

#include "network.h"

/* Network context allocation */
STAI_ALIGNED(STAI_NETWORK_CONTEXT_ALIGNMENT)
static network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};


#define _MIN(x_, y_) \
    ( ((x_)<(y_)) ? (x_) : (y_) )

#define _MAX(x_, y_) \
    ( ((x_)>(y_)) ? (x_) : (y_) )

#define _CLAMP(x_, min_, max_, type_) \
    (type_) (_MIN(_MAX(x_, min_), max_))

#define _ROUND(v_, type_) \
    (type_) ( ((v_)<0) ? ((v_)-0.5f) : ((v_)+0.5f) )

const stai_tensor* get_input_desc(stai_network_info* info, const uint8_t idx)
{
  if (info && (idx > 0) && (idx <= info->n_inputs)) {
    return &info->inputs[idx - 1];
  }
  return NULL;
}

float input_f[STAI_NETWORK_IN_1_SIZE];
int8_t input_q[STAI_NETWORK_IN_1_SIZE]; /* or uint8_t */

stai_network_info net_info;

{
  stai_runtime_init();
  stai_network_init(network_context);
  stai_network_get_info(network_context, &net_info);

  const stai_tensor *input_1 = get_input_desc(&net_info, 1);
  if (input_1 == NULL) {...}

  const float scale  = input_1->scale.data[0];
  const int32_t zp = input_1->zeropoint.data[0];

  scale = 1.0f / scale;

  /* Loop */
  for (int i=0; i < STAI_NETWORK_IN_1_SIZE; i++)
  {
    const int32_t tmp_ = zp + _ROUND(input_f[i] * scale, int32_t);
    /* for uint8_t */
    input_q[i] = _CLAMP(tmp_, 0, 255, uint8_t);
    /* for int8_t */
    input_q[i] = _CLAMP(tmp_, -128, 127, int8_t);
  }
  ...

  ...
  stai_network_deinit(network_context);
  stai_runtime_deinit();
}

8b to float32 data type conversion

Following code snippet illustrates the integer (int8_t/uint8_t) to float (float) format conversion. The output buffer is used as source buffer.

#include "network.h"

int8_t output_q[STAI_NETWORK_OUT_1_SIZE]; /* or uint8_t */
float output_f[STAI_NETWORK_OUT_1_SIZE];

stai_network_info net_info;

const stai_tensor* get_output_desc(stai_network_info* info, const uint8_t idx)
{
  if (info && (idx > 0) && (idx <= info->n_outputs)) {
    return &info->outputs[idx - 1];
  }
  return NULL;
}

{
  stai_runtime_init();
  stai_network_init(network_context);
  stai_network_get_info(network_context, &net_info);

  const stai_tensor* output_1 = get_output_desc(&net_info, 1);
  if (output_1 == NULL) {...}

  const float scale  = output_1->scale.data[0];
  const int32_t zp = output_1->zeropoint.data[0];

  /* Loop */
  for (int i=0; i<STAI_NETWORK_OUT_1_SIZE; i++)
  {
    output_f[i] = scale * ((float)(output_q[i]) - zp);
  }
  ...

  ...
  stai_network_deinit(network_context);
  stai_runtime_deinit();
}

C-memory layouts

To store the elements of the different tensors, standard C-array-of-array struct are used. For the float and integer data-type, the associated c-type is used. For the bool type, value is stored as uint8_t while for binary tensor, to optimize the size, values are packed on the 32b words (see “c-layout of the s1 type” section for more details).

Note

Note that 'BHWC' data format (or ‘channel-last’) is used for illustration but for BCHW data format (or ‘channel-first’), C-memory layout respects the same principle.

1d-array tensor

For a 1-D tensor, standard C-array type with the following memory layout is expected to handle the input and output tensors.

#include "network.h"

/* in network.h STAI_<NAME>_IN_1_SIZE is defined ( where e.g. IN_1_SIZE = H * W * C = C ) */

float xx_data[B * STAI_<NAME>_IN_1_SIZE]; /* n_batch = B, height = 1,
                                             width = 1, channels = C */
or:

float xx_data[B][STAI_<NAME>_IN_1_SIZE]; 
1-D Tensor data layout

2d-array tensor

For a 2-D tensor, standard C-array-of-array memory arrangement is used to handle the input and output tensors. 2-dim are mapped to the first two dimensions of the tensor in the original toolbox representation: e.g. H and C in Keras / Tensorflow.

#include "network.h"

/* STAI_<NAME>_IN_1_SIZE defined in network.h ( where e.g. IN_1_SIZE = H * W * C = H * C ) */

float xx_data[STAI_<NAME>_IN_1_SIZE];  /* n_batch = 1, height = H,
                                          width = 1, channels = C */
/* STAI_<NAME>_IN_1_HEIGHT, STAI_<NAME>_IN_1_CHANNEL
   defined in network.h */
float xx_data[STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_CHANNEL];

float xx_data[B * STAI_<NAME>_IN_1_SIZE_SIZE]; /* n_batch = B, height = H,
                                                  width = 1, channels = C */
float xx_data[B][STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_CHANNEL];
2-D Tensor data layout

3d-array tensor

For a 3D-tensor, standard C-array-of-array-of-array memory arrangement is used to handle the input and output tensors.

#include "network.h"

/* STAI_<NAME>_IN_1_SIZE defined in network.h ( where e.g. IN_1_SIZE = H * W * C = H * W * C ) */

float xx_data[STAI_<NAME>_IN_1_SIZE];  /* n_batch = 1, height = H,
                                          width = W, channels = C */
/* STAI_<NAME>_IN_1_HEIGHT, STAI_<NAME>_IN_1_WIDTH, STAI_<NAME>_IN_1_CHANNEL
   defined in network.h */
float xx_data[STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_WIDTH][STAI_<NAME>_IN_1_CHANNEL];

float xx_data[B * STAI_<NAME>_IN_1_SIZE_SIZE]; /* n_batch = B, height = H,
                                                  width = W, channels = C */
float xx_data[B][STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_WIDTH][STAI_<NAME>_IN_1_CHANNEL];
3-D Tensor data layout