Embedded Inference Client ST Edge AI API (st-ai)
ST Edge AI Core Technology 2.2.0
r1.2
Introduction
This article describes the ST Edge AI embedded inference client
API which must be used by a C-application layer (AI client) to use a
deployed C-model. This API is available for ISPU, STM32, and STELLAR
targets. All model-specific definitions and implementations can be
found in the generated C-files: <name>.c
and
<name>.h
files
Figure above shows that the integration of the ST Edge AI stack
in an application is simple and straightforward since there are few
or standard SW/HW dependencies with the run-time. ST Edge AI client
uses the generated model through a set of well-defined stai_<name>_XXX()
functions (also called “Embedded inference client ST Edge AI
APIs”). The ST Edge AI software stack provides a compiled
library (i.e. network runtime library) per ST device and supported
tool-chain.
Getting started - Minimal application
The following code snippet provides a typical and minimal example
using the API for a 32b floating-point model. The pre-trained model
is generated with the --no-inputs-allocation
and
--no-outputs-allocation
options (i.e. input and output
buffers are not allocated in the “activations” buffer and default
network
c-name is used). Note that all AI requested
client resources (activations buffer and data buffers for the IO)
are allocated at compile time thanks the generated macros: STAI_NETWORK_XXX_SIZE
allowing a minimalist, easier and quick integration.
#include <stdio.h>
#include "network.h"
/* Global byte buffer to save instantiated C-model network context */
(STAI_NETWORK_CONTEXT_ALIGNMENT)
STAI_ALIGNEDstatic stai_network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};
/* Global c-array to handle the activations buffer */
(STAI_NETWORK_ACTIVATION_1_ALIGNMENT)
STAI_ALIGNEDstatic uint8_t activations[STAI_NETWORK_ACTIVATION_1_SIZE_BYTES];
/* Array to store the data of the input tensor */
(STAI_NETWORK_IN_1_ALIGNMENT)
STAI_ALIGNEDstatic float in_data[STAI_NETWORK_IN_1_SIZE];
/* or static uint8_t in_data[STAI_NETWORK_IN_1_SIZE_BYTES]; */
/* c-array to store the data of the output tensor */
(STAI_NETWORK_OUT_1_ALIGNMENT)
STAI_ALIGNEDstatic float out_data[STAI_NETWORK_OUT_1_SIZE];
/* static uint8_t out_data[STAI_NETWORK_OUT_1_SIZE_BYTES]; */
/* Array of pointer to manage the model's input/output tensors */
static stai_ptr stai_input[STAI_NETWORK_IN_NUM];
static stai_ptr stai_output[STAI_NETWORK_OUT_NUM];
/*
* Bootstrap
*/
int aiInit(void) {
;
stai_return_code ret_code
/* Initialize runtime library */
= stai_runtime_init();
ret_code if (ret_code != STAI_SUCCESS) { ... };
/* Initialize network model context */
= stai_network_init(network_context);
ret_code if (ret_code != STAI_SUCCESS) { ... };
/* Set network activations buffers */
const stai_ptr acts[] = { activations };
= stai_network_set_activations(network_context, acts, STAI_NETWORK_ACTIVATIONS_NUM);
ret_code if (ret_code != STAI_SUCCESS) { ... };
return 0;
}
int aiDeinit(void) {
;
stai_return_code ret_code
/* Deinitialize network model context */
= stai_network_deinit(network_context);
ret_code if (ret_code != STAI_SUCCESS) { ... };
/* Deinitialize runtime library */
= stai_runtime_deinit();
ret_code if (ret_code != STAI_SUCCESS) { ... };
return 0;
}
/*
* Run inference
*/
int aiRun(const void *in_data, void *out_data) {
;
stai_return_code ret_code
/* Set network input/output buffers */
const stai_ptr inputs_ptr[] = { in_data };
= stai_network_set_inputs(network_context, inputs_ptr, STAI_NETWORK_IN_NUM);
ret_code if (ret_code != STAI_SUCCESS) { ... };
const stai_ptr outputs_ptr[] = { out_data };
= stai_network_set_outputs(network_context, outputs_ptr, STAI_NETWORK_OUT_NUM);
ret_code if (ret_code != STAI_SUCCESS) { ... };
/* 2 - Perform the inference */
= stai_network_run(network, STAI_MODE_SYNC);
ret_code if (ret_code != STAI_SUCCESS) {
= stai_network_get_error(network_context);
ret_code ...
};
return 0;
}
/*
* Example of main loop function
*/
void main_loop() {
();
aiInit
while (1) {
/* 1 - Acquire, pre-process and fill the input buffers */
(in_data);
acquire_and_process_data
/* 2 - Call inference engine */
(in_data, out_data);
aiRun
/* 3 - Post-process the predictions */
(out_data);
post_process}
();
aiDeinit}
AI buffers and privileged placement
From the application/integration point of view, only four memory-related objects are considered as dimensioning for the system. They are fixed size since there is no support for the dynamic tensors i.e., all the sizes and shapes of the tensors are defined/fixed at generation time. In this way inference C run-time engine does not require to use the system heap.
- “activations” buffers consist of a simple contiguous
memory-mapped buffer (or multiple memory-mapped buffers if a json
memory description file is passed to the CLI, see Multiple heap support section). They
are placed into a read-write memory segment and are owned and
allocated by the AI client application (unless the
--allocate-activations
option is used, in this case the generated code will instantiate them). Their addresses are passed to the network instance (seestai_<name>_set_activations()
function) and they are used as a set of temporary private heaps (or working buffers) during the execution of the inference to store the intermediate results. Between two inferences, the associated memory segments can be reused by the application. Their sizes (STAI_<NAME>_ACTIVATIONS_SIZES
) are defined during the code generation and their sum corresponds to the reportedRAM
metric. At the same time a set of macrosSTAI_<NAME>_ACTIVATION_X_SIZE
andSTAI_<NAME>_ACTIVATION_X_SIZE_BYTES
are generated for each buffer. The number of activations buffers is exposed to client application bySTAI_<NAME>_ACTIVATIONS_NUM
and their total size is reported inSTAI_<NAME>_ACTIVATIONS_SIZE
- “states” buffers consist of a simple contiguous memory-mapped
buffer (or multiple memory-mapped buffers if a json memory
description file is passed to the CLI, see Multiple heap support section). They
are placed into a read-write memory segment and are owned and
allocated by the AI client application (unless the
--allocate-states
option is used, in this case the generated code will instantiate them). Their addresses are passed to the network instance (seestai_<name>_set_states()
function) and they are used as a set of persistent private heaps (or working buffers) during the execution of the inference to store the internal state of a stateful network. Between two inferences, the associated memory segments can not be reused by the application. Their sizes (STAI_<NAME>_STATES_SIZES
) are defined during the code generation. At the same time a set of macrosSTAI_<NAME>_STATE_X_SIZE
andSTAI_<NAME>_STATE_X_SIZE_BYTES
are generated for each buffer. States buffers are generated only if a model contains at least one of the supported stateful operators and the stateful property is set to true. The number of states buffers is exposed to client application bySTAI_<NAME>_STATES_NUM
and their total size is reported inSTAI_<NAME>_STATES_SIZE
- “weights” buffers are a simple contiguous memory-mapped buffer
(or multiple memory-mapped buffers with the
--split-weights
option). They are generally placed into a non-volatile and read-only memory device. Their addresses may be optionally passed to the network instance (seestai_<name>_set_weights()
function) and they are used as a set read-only memory buffers during the execution of the inference Their sizes (STAI_<NAME>_WEIGHTS_SIZES
) are defined during the code generation and their sum corresponds to the reportedROM
metric. At the same time a set of macrosSTAI_<NAME>_WEIGHT_X_SIZE
andSTAI_<NAME>_WEIGHT_X_SIZE_BYTES
are generated for each buffer. The number of weights buffers is exposed to client application bySTAI_<NAME>_WEIGHTS_NUM
and their total size is reported inSTAI_<NAME>_WEIGHTS_SIZE
- “output” and “input” buffers must be also placed in the
read-write memory-mapped buffers. By default, they are owned and
provided by the AI client. Their sizes are model dependent and known
at generation time
(
STAI_<NAME>_IN/OUT_SIZE_BYTES
). but they can be also located in the “activations” buffer. Their addresses may be optionally passed to the network instance (seestai_<name>_set_[inputs|outputs]()
functions) and they are used as a set memory buffers used to store network input/output data used during the execution of the inference
Note
The placement of the buffers is application linker or/and runtime dependent. Additional ROM and RAM for the network runtime library itself and for network c-files (txt/rodata/bss and data sections) can be also considered but they are generally not significant to dimension the system in comparison of the requested size for the “weighs” and “activations” buffers.
Following table details the privileged placement choices adopted when targeting STM32 devices to minimize the inference time. Usually the most constrained memory object is the “activations” buffer.
memory object type | preferably placed in |
---|---|
client stack | a low latency & high bandwidth device. STM32 embedded SRAM or data-TCM when available (zero wait-state memory). |
activations, inputs/outputs | a low/medium latency & high bandwidth device. STM32 embedded SRAM when available or external RAM. The trade-off is mainly driven by the size and if the STM32 MCU has a data cache (Cortex-M7 family). If input buffers are not allocated in the “activations” buffer, the “activations” buffer should be privileged. |
weights | a medium latency & medium bandwidth device. STM32 embedded FLASH memory or external FLASH. The trade-off is driven by the STM32 MCU data cache availability (Cortex-M7 family), the weights can be split between different memory devices. |
I/O buffers inside the “activations” buffer
By default, the input and output buffers are allocated in the
“activations” buffer. During generation, the minimal size of the
“activations” buffer is adjusted accordingly. Please note that the
base addresses of the respective memory sub-regions depend on the
model. These addresses are not necessarily aligned with the base
address of the “activations” buffer and are pre-defined or
pre-computed at generation time. For more details, refer to the snippet code. Inside the
“activations” buffer, the reserved memory regions are 4-bytes
aligned (or 8-bytes) according the selected target. For the specific
needs, the user has the possibility to define the requested
memory-alignment for the input buffers (respectively for the output
buffers) with the --input-memory-alignment INT
option (respectively --output-memory-alignment
).
- The “external” input/output buffers (i.e., allocated outside the
activations buffer) can always be used since input and output buffer
addresses can be overwritten with the
stai_<name>_set_inputs
and `stai_<name>_set_outputs
functions, respectively. - By default, the code generator reserves only one place per input
tensor in the activations buffer (and similarly for output buffers).
If a double buffer scheme should be implemented, it is recommended
to use the
--no-[inputs|outputs]-allocation
options and manage the IO buffers through the application.
Multiple heap support
To optimize usage of the RAM for performance reasons or because the available memory is fragmented between different memory pools (embedded in the device or externally), the activations buffer can be allocated in different memory segments.
Thanks to the --memory-pool
option, the user has the possibility to provide a description of the
device available memories (i.e. memory pools) which can be used to
place the activation/scratch/state tensor buffers. During the code
generation, the allocator will privilege the memory pools according
to the memory characteristics (i.e. latency and throughput) and with
pools having same characteristics by the order listed in the JSON
file. If a selected memory pool can be not used (insufficient size),
the next will be used if available else an error is generated. The
allocator try to place each specific buffer according the the
characteristics of the buffer to place and the characteristics of
the memory pools available and their residual size. The preferred
location for critical buffers (e.g. the scratch
buffers) are memory pools with high throughput and low latency.
Pools with low throughput and high latency (e.g. external RAM) are
used only if no space is available in other more performant memory
pools. The final placement of the buffers depends also on the optimization objective
option defined in CLI (i.e. balanced
is the
default):
time
optimization hints the allocator places buffers in pools with higher performances no matter of the final RAM size used.ram
optimization hints the allocator try to minimize the total amount of RAM used by placing the buffers potentially in less performant memory pools.balanced
option is a trade-off where the allocator try to put buffers in the best-fitting memory pool yet minimizing the total amount of RAM used.
Following figure illustrates the case, where the activations
buffer is split in 3. The first part is placed in the low
latency/high throughput memory (like DTCM for STM32H7/F7 series),
the second is placed in a “normal” internal memory and the last in
an external memory. The JSON file is requested to indicate the
maximum size of each memory segment ("usable_size"
key)
which can be used by the AI stack allowing to reserve a part of the
critical memory resources for other SW objects.
Following snippet code illustrates the initialization sequence.
The 'activationsX'
objects will be placed in different
memory pools thanks to specific linker directives by the
end-user.
...
(STAI_NETWORK_ACTIVATION_1_ALIGNMENT)
STAI_ALIGNEDstatic uint8_t activations1[STAI_NETWORK_ACTIVATION_1_SIZE_BYTES];
(STAI_NETWORK_ACTIVATION_2_ALIGNMENT)
STAI_ALIGNEDstatic uint8_t activations2[STAI_NETWORK_ACTIVATION_2_SIZE_BYTES];
(STAI_NETWORK_ACTIVATION_3_ALIGNMENT)
STAI_ALIGNEDstatic uint8_t activations3[STAI_NETWORK_ACTIVATION_3_SIZE_BYTES];
...
int aiInit(void) {
;
stai_return_code ret_code
/* Create and initialize the c-model */
const stai_ptr acts[] = { activations1, activations2, activations3 };
/* Initialize runtime library */
= stai_runtime_init();
ret_code if (ret_code != STAI_SUCCESS) { ... }
/* Initialize network model context */
= stai_network_init(network_context);
ret_code if (ret_code != STAI_SUCCESS) { ... }
/* Set network activations buffers */
= stai_network_set_activations(network_context, acts, STAI_NETWORK_ACTIVATIONS_NUM);
ret_code if (ret_code != STAI_SUCCESS) { ... }
...
}
Split weights buffer
The --split-weights
option allows to place
statically tensor-by-tensor the weights in different memory segments
(on or off-chip) thanks to specific linker directives for the
end-user application.
- it relaxes the placing constraint of a large buffer into a
constrained and non-homogeneous memory sub-system.
- after profiling, it allows to improve the global inference time, by placing the critical weights into a low latency memory. Or on the contrary it can free the critical resource (i.e. internal flash) which can be used by the application.
The --split-weights
option prevents the generation
of a unique c-array for the whole data of the weights/bias tensors
(<name>_data.c
file). Without the options weights
are declared as:
(8)
STAI_ALIGNEDconst uint8_t s_network_weights[ 794136 ] = {
0xcf, 0xae, 0x9d, 0x3d, 0x1b, 0x0c, 0xd1, 0xbd, 0x63, 0x99,
0x36, 0xbd, 0xdb, 0x67, 0x46, 0xbe, 0x3b, 0xe7, 0x0d, 0x3e,
...
0x41, 0xbf, 0xc6, 0x7d, 0x69, 0x3e, 0x18, 0x87, 0x37,
0xbe, 0x83, 0x63, 0x0f, 0x3f, 0x51, 0xa1, 0xdd, 0xbe
};
that is declared in <name>_data.h
header as
following:
(8)
STAI_ALIGNEDextern const uint8_t s_network_weights[ 794136 ];
On the contrary with --split-weights
a
s_<network>_<layer_name>_[bias|weights|*]_array_weights[]
)
c-array is created to store the data of each weight/bias tensor. A
global map table is also built that is used by the run-time to
retrieve the addresses of the different c-arrays.
...
/* conv2d_1_weights_array - FLOAT|CONST */
(8)
STAI_ALIGNEDconst uint8_t s_network_conv2d_1_weights_array_weights[ 2048 ] = {
0xcf, 0xae, 0x9d, 0x3d, 0x1b, 0x0c, 0xd1, 0xbd, 0x63, 0x99,
...
}
...
/* dense_3_bias_array - FLOAT|CONST */
(8)
STAI_ALIGNEDconst uint8_t s_network_dense_3_bias_array_weights[ 24 ] = {
0xa2, 0x72, 0x82, 0x3e, 0x5a, 0x88, 0x41, 0xbf, 0xc6, 0x7d,
0x69, 0x3e, 0x18, 0x87, 0x37, 0xbe, 0x83, 0x63, 0x0f, 0x3f,
0x51, 0xa1, 0xdd, 0xbe
};
- without particular linker directives, these multiple c-arrays
are always placed in a
.rodata
section as for the unique c-array.
- client API is not changed. the
stai_<name>_get_weights()
function is used retrieve addresses of the weights buffers.stai_<name>_set_weights()
function is used to set weights addresses
- as illustrated in the previous figure,
const
C-attribute can be manually commented to use the default C-startup behavior to copy the data in an initialized RAM data section.
Re-entrance and thread safety considerations
No internal synchronization mechanism is implemented to protect
the entry points against concurrent accesses. If the API is used in
a multi-threaded context, the protection of the instantiated NN(s)
must be guaranteed by the application layer itself. To minimize the
usage of the RAM, the same activation memory chunk
(SizeSHARED
) can be used to support multiple networks.
In this case, the user must guarantee that an on-going inference
execution cannot be preempted by the execution of another
network.
= MAX(STAI_<name>_ACTIVATIONS_SIZE_BYTES) for name = “NET1” … “NET2” SizeSHARED
Tip
If the preemption is expected for real-time constraint or latency reasons, each network instance must have its own and private activations buffer.
Debug support
The network runtime library must be considered as an optimized
black box object in binary format (sources files are not delivered).
There is no run-time services allowing to dump internal states.
Mapping and port of the model is guaranteed by the ST Edge AI code
generator. Some integration issues can be highlighted by the
stai_<name>_get_error()
function.
Versioning
In <network>.h
generated file there is a set
of macros imported from stai.h header that allows to know the
version of the tool used to generated the specialized NN C-files and
the versions of the associated run-time API.
Warning
Backward or/and forward compatibility between generated code and run-time library is not fully guaranteed. If a new version of the tool is used to generate the new specialized NN c-files, it is highly recommended to update also the associated header files and network run-time library.
/* stai.h file */
#define STAI_TOOLS_VERSION_MAJOR 1
#define STAI_TOOLS_VERSION_MINOR 0
#define STAI_TOOLS_VERSION_MICRO 0
#define STAI_API_VERSION_MAJOR 1
#define STAI_API_VERSION_MINOR 0
#define STAI_API_VERSION_MICRO 0
type | description |
---|---|
STAI_TOOLS_VERSION_XX | global version of the tool package |
STAI_API_VERSION_XX | version of the API which is used by the generated NN c-files to call the network runtime library. |
ST Edge AI STAI C APIs return codes
Each ST Edge AI C API returns an exit code telling the status
after calling the API. If the APIs succeed the return code
stai_return_code
is STAI_SUCCESS
. In case
of an internal error (e.g., because of mismatches in arguments
provided to the APIs) different return codes can be returned. The C
network context keeps track of the first error thus the following
invoked APIs and also the stai_<name>_get_error(stai_network*)
will always return the first generated error. The
stai_return_code
can be used by client application to
manage and report errors.
C-enum | description |
---|---|
STAI_SUCCESS |
No errors triggered. API succeed |
STAI_RUNNING_NO_WFE |
Currently not supported |
STAI_RUNNING_WFE |
Currently not supported |
STAI_DONE |
The API completed its tasks |
STAI_ERROR_GENERIC |
Generic Error code |
STAI_ERROR_NETWORK_INVALID_API_ARGUMENTS |
(at least one) invalid argiment has been provided to the API |
STAI_ERROR_NETWORK_INVALID_CONTEXT_HANDLE |
The provided context pointer is not valid (e.g. it is NULL or it is corrupted) |
STAI_ERROR_NETWORK_INVALID_CONTEXT_SIZE |
The provided context has different (byte) size than expected |
STAI_ERROR_NETWORK_INVALID_CONTEXT_ALIGNMENT |
The provided context pointer has an invalid alignment |
STAI_ERROR_NETWORK_INVALID_INFO |
The network C info are corrupted |
STAI_ERROR_NETWORK_INVALID_RUN |
The
stai<name>_run() API failed |
STAI_ERROR_NETWORK_INVALID_RUNTIME |
The runtime initialization failed |
STAI_ERROR_NETWORK_INVALID_ACTIVATIONS_PTR |
(at least one of) the activation buffers pointers is invalid |
STAI_ERROR_NETWORK_INVALID_ACTIVATIONS_NUM |
Wrong number of activation buffers provided |
STAI_ERROR_NETWORK_INVALID_IN_PTR |
(at least one of) the input buffers pointers is invalid |
STAI_ERROR_NETWORK_INVALID_IN_NUM |
Wrong number of input buffers provided |
STAI_ERROR_NETWORK_INVALID_OUT_PTR |
(at least one of) the output buffers pointers is invalid |
STAI_ERROR_NETWORK_INVALID_OUT_NUM |
Wrong number of output buffers provided |
STAI_ERROR_NETWORK_INVALID_STATES_PTR |
(at least one of) the state buffers pointers is invalid |
STAI_ERROR_NETWORK_INVALID_STATES_NUM |
Wrong number of state buffers provided |
STAI_ERROR_NETWORK_INVALID_WEIGHTS_PTR |
(at least one of) the weights buffers pointers is invalid |
STAI_ERROR_NETWORK_INVALID_WEIGHTS_NUM |
Wrong number of weights buffers provided |
STAI_ERROR_NETWORK_INVALID_CALLBACK |
Invalid callback pointer set |
STAI_ERROR_NOT_IMPLEMENTED |
API is not implemented (i.e., for the specific target) |
STAI_ERROR_INVALID_BUFFER_ALIGNMENT |
A buffer pointer has an invalid expected alignment |
STAI_ERROR_NOT_CURRENT_NETWORK |
Wrong context handle provided (e.g., orchestrating Multiple Networks) |
STAI_ERROR_NETWORK_STILL_RUNNING |
The last inference has not yet been completed |
STAI_ERROR_STAI_INIT_FAILED |
Failed initialization of an
init API |
STAI_ERROR_STAI_DEINIT_FAILED |
Failed de-initialization of a
deinit API |
ST Edge AI STAI_<NAME>_XXX C-defines
Different C-defines are generated in the
<name>.h
header file. They can be used by the
application code to allocate at compile time or dynamically the
requested buffers, or for debug purpose. At run-time, stai_<network>_get_info()
can also be used to retrieve the requested sizes.
C-defines | description |
---|---|
STAI_<NAME>_MODEL_NAME |
C-string with the C-name of the model |
STAI_<NAME>_MODEL_SIGNATURE |
C-Model generated checksum as a hex number |
STAI_<NAME>_ORIGIN_MODEL_NAME |
C-string with the original name of the model |
STAI_<NAME>_ORIGIN_MODEL_SIGNATURE |
C-string with the checksum of the original model |
STAI_<NAME>_MACC_NUM |
C-Model estimated complexity (as a number of MAC operations) |
STAI_<NAME>_NODES_NUM |
C-Model number of operational nodes generated |
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_NUM |
total number of input/output/activations/weights/states buffers |
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZE |
total number of elements of all the in/out/activations/weights/states buffers |
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZE_BYTES |
total size in bytes of all the in/out/activations/weights/states buffers |
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_ALIGNMENTS |
C-table (integer type) to specify the alignment of in/out/activations/weights/states buffers |
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZES |
C-table (integer type) to specify the number of item by in/out/activations/weights/states buffers (see “IO tensor description” section) |
STAI_<NAME>_IN/OUT/ACTIVATIONS/WEIGHTS/STATES_SIZES_BYTES |
C-table (integer type) to specify the size in bytes by in/out/activation/weight/state buffer |
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_SIZE |
total number of item for the x-th in/out/activation/weight/state buffer |
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_SIZE_BYTES |
size in bytes for the x-th in/out/activation/weight/state buffer |
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_ALIGNMENT |
expected C-array alignment for the x-th in/out/activation/weight/state buffer |
STAI_<NAME>_IN/OUT/ACTIVATION/WEIGHT/STATE_x_FLAGS |
flags for the x-th in/out/activation/weight/state buffer |
STAI_<NAME>_IN/OUT_x_NAME |
C-string (optional) storing the name for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_FORMAT |
C-format using stai enums for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_SHAPE |
expected C-shape dimension values for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_RANK |
expected cardinality for C-shape of the the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_BATCH |
(optional) batch dimension size for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_CHANNEL |
(optional) channel dimension size for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_WIDTH |
(optional) width dimension size for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_HEIGHT |
(optional) height dimension size for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_SCALE |
(optional) scale quantization factor for the x-th in/out buffer |
STAI_<NAME>_IN/OUT_x_ZERO_POINT |
(optional) zeropoint quantization factor for the x-th in/out buffer |
ST Edge AI client APIs
This section presents the list of C APIs for interfacing
generated C-network models with a client. It is declared in
stai.h
header file (model independent APIs) and
<name>
.h header file (model dependent APIs).
stai_runtime_init()
(void); stai_return_code stai_runtime_init
This mandatory function is used by the application to initialize the ST Edge AI C run-time. It must be called just once before using the AI Platform.
#include "network.h"
...
= stai_runtime_init();
stai_return_code ret_code if (ret_code != STAI_SUCCESS) {
...
}
stai_runtime_deinit()
(void); stai_return_code stai_runtime_deinit
This mandatory function is used by the application to de-initialize the ST Edge AI C run-time. It must be called just once when the AI Platform is no more needed.
#include "network.h"
...
= stai_runtime_deinit();
stai_return_code ret_code if (ret_code != STAI_SUCCESS) {
...
}
stai_runtime_get_info(stai_runtime_info* info)
(stai_runtime_info* info); stai_return_code stai_runtime_get_info
This function is used by the application to retrieve some info
about ST Edge AI C run-time. The info are filled into the
stai_runtime_info
C struct defined as follow:
typedef struct {
; /* X.Y.Z version of the ST Edge AI APIs */
stai_version api_version; /* X.Y.Z version of the runtime */
stai_version runtime_version; /* version of the tool compatible with the run-time */
stai_version tools_versionuint32_t runtime_build; /* 32bit run-time identifier (i.e. build info) */
; /* compiler ID enum */
stai_compiler_id compiler_idconst char* compiler_desc; /* string with a short description of the compiler */
} stai_runtime_info;
Following code snippet shows how to print some of the
stai_runtime_info
fields:
#include <stdio.h>
#include "network.h"
...
;
stai_runtime_info info= stai_runtime_get_info(&info);
stai_return_code ret_code if (ret_code != STAI_SUCCESS) {
("Runtime info:\n");
printf(" - api_version : %d.%d.%d\n",
printf.api_version.major, info.api_version.micro, info.api_version.micro);
info(" - runtime_version: %d.%d.%d\n",
printf.runtime_version.major, info.runtime_version.micro, info.runtime_version.micro);
info(" - runtime_build : 0x%08x\n", info.runtime_build);
printf(" - compiler_id : 0x%02x\n", info.compiler_id);
printf(" - compiler_desc : %s\n\n", info.compiler_desc);
printf...
}
...
stai_<name>_init()
(stai_network* network); stai_return_code stai_network_init
This mandatory function is used by the application to initialize the internal data structures (i.e., context) of a generated network model.
- NOTE:
network
pointer should be a valid byte array with size STAI_NETWORK_CONTEXT_SIZE and proper alignment STAI_NETWORK_CONTEXT_ALIGNMENT
#include "network.h"
(STAI_NETWORK_CONTEXT_ALIGNMENT)
STAI_ALIGNEDstatic stai_network context[STAI_NETWORK_CONTEXT_SIZE] = {0};
...
= stai_network_init(context); stai_return_code ret_code
stai_<name>_deinit()
(stai_network* network); stai_return_code stai_network_deinit
This mandatory function is used by the application to de-initialize the internal run-time data structures of a network context
network
pointer should be a valid byte array with size STAI_NETWORK_CONTEXT_SIZE and proper alignment STAI_NETWORK_CONTEXT_ALIGNMENT
#include "network.h"
...
(STAI_NETWORK_CONTEXT_ALIGNMENT)
STAI_ALIGNEDstatic stai_network context[STAI_NETWORK_CONTEXT_SIZE] = {0};
...
= stai_network_deinit(context); stai_return_code ret_code
stai_<name>_get_info()
stai_<name>_get_info(stai_network* network, stai_network_info* info); stai_return_code
This function allows to retrieve the run-time data attributes of
an instantiated model. Refer to stai.h
file to show the
details of the returned stai_network_info
C-structure.
Warning
- before invoking this call the network context should have been already initialized.
Typical usage
#include "network.h"
...
(STAI_NETWORK_CONTEXT_ALIGNMENT)
STAI_ALIGNEDstatic stai_network context[STAI_NETWORK_CONTEXT_SIZE] = {0};
...
;
stai_network_info info;
stai_return_code ret_code...
/* 1 - Initialize runtime */
();
stai_runtime_init/* 2 - Create and initialize network context */
(network_context);
stai_network_init...
= stai_network_get_info(network_context, &info)
ret_code if (ret_code == STAI_SUCCESS) { /* display the info */ }
Example of possible reported informations
Network information---------------------------------------------------
: network
c_model_name : e38d1b6095099638ae20a42e53398dd7
c_model_signature : Mon Nov 30 19:40:02 2023
c_model_datetime : Nov 30 2023 19:40:31
c_compile_datetime : 9.1.0
runtime_version : 1.0.0
tool_version : 1.0.0
api_version : 0x0
flags : 336088
n_macc : 3
n_nodes : 0
n_states : 1
n_weights : 1
n_inputs : 1 n_outputs
stai_<name>_get_[inputs|outputs]()
stai_<name>_get_inputs(stai_network* network, stai_ptr* inputs, stai_size* n_inputs);
stai_return_code stai_<name>_get_outputs(stai_network* network, stai_ptr* outputs, stai_size* n_outputs); stai_return_code
These functions return network inputs/outputs array base
addresses as a table of array pointers. Number of inputs
(respectively of outputs) is returned through n_inputs
(respectively n_outputs
).
These functions could be used for:
- Feeding data into a network: retrieve input pointers, write data to it
- Retrieveing output data after an inference: retrieve output pointers, get data from it
stai_<name>_get_[activations|weights|states]()
stai_<name>_get_activations(stai_network* network, stai_ptr* activations, stai_size* n_activations);
stai_return_code stai_<name>_get_weights(stai_network* network, stai_ptr* weights, stai_size* n_weights);
stai_return_code stai_<name>_get_states(stai_network* network, stai_ptr* states, stai_size* n_states); stai_return_code
These functions return network activations/weights/states array
base addresses as a table of array pointers.
n_activations
/n_weights
/n_states
allow to return the number of activation/weights/persistent states
buffers.
These functions could be used for e.g. debugging. The
stai_<name>_[get|set]_states()
can also be used
to save/restore a state of a stateful network at any given time.
stai_<name>_set_[inputs|outputs]()
stai_<name>_set_inputs(stai_network* network, stai_ptr* inputs, const stai_size n_inputs);
stai_return_code stai_<name>_set_outputs(stai_network* network, stai_ptr* outputs, const stai_size n_outputs); stai_return_code
These functions set network inputs/outputs array base addresses
as a table of array pointers. The table of pointers must be
allocated by client before invoking the API. n_inputs
is the number of inputs to be set while n_outputs
is
the number of outputs to be set (i.e. the number of pointers in each
table).
Typical use-cases are:
If the
--no-[inputs|outputs]-allocation
CLI option is used, these functions must be called to set the addresses of the IO buffers used during the inference; otherwise, the inference will result in an error.These functions can be also used to overwrite the addresses of the IO buffers, for example to support a double buffering scheme. Note that the original addresses of the buffers allocated in activations buffer (default behavior), initially returned by the
stai_<name>_get_[inputs|outputs]()
functions are not preserved. When a double buffering scheme is expected, it is recommended to use the--no-[inputs|outputs]-allocation
CLI options.
These APIs shall be called before invoking stai_<name>_run
.
Tip
STAI_<name>_IN_NUM
, resp.
STAI_<name>_OUT_NUM
, helper macro can be used to
know at compile-time the number of network inputs, resp. outputs.
These values are also returned by the stai_network_info
struct n_inputs
and n_outputs
fields (see
stai_<name>_get_info()
function).
stai_<name>_set_[activations|weights|states]()
stai_<name>_set_activations(stai_network* network, stai_ptr* activations, const stai_size n_activations);
stai_return_code stai_<name>_set_weights(stai_network* network, stai_ptr* weights, const stai_size n_weights);
stai_return_code stai_<name>_set_states(stai_network* network, stai_ptr* states, const stai_size n_states); stai_return_code
These functions set network activations/weights/states array base
addresses as a table of array pointers. The table of pointers must
be allocated by client before invoking the API.
n_activations
/n_weights
/n_states
are the number of activation, weight, persistent state buffers.
Typical use-cases are:
stai_<name>_set_activations()
must be invoked if--allocate-activations
is not used to provide to runtime with client allocated activations buffers pointers.stai_<name>_set_states()
must be invoked if--allocate-states
is not used to provide to runtime with client allocated (persistent) states buffers pointers.stai_<name>_set_weights()
must be invoked if--binary
was used, to provide to runtime and the weights buffers pointers.
When used, the client application is responsabile for providing this function with enough addresses in the table and with pointers pointing to properly-sized memory chunks (this information is statically defined, in the generated files).
See Multiple heap support for an example of use.
Tip
STAI_<NAME>_ACTIVATIONS/WEIGHTS/STATES_NUM
can be used to find the number of elements in the tables to be
defined.
STAI_<NAME>_ACTIVATIONS/WEIGHTS/STATES_SIZES_BYTES
can be used to find the size of each element
stai_<name>_run()
stai_<name>_run(stai_network* network, const stai_run_mode mode); stai_return_code
This function is called to run the neural network inference. The
API may be blocking or non blocking depending on the
mode
parameter. Default mode is
STAI_MODE_SYNC
that is implemented for all the targets.
Non blocking model (i.e., STAI_MODE_ASYNC
) is currently
not supported.
The returned value is a stai_return_code
value: if
this value is STAI_SUCCESS
the run has been completed
correctly, else the first error is reported. See also the stai_<name>_get_error(stai_network*)
Typical usages
Default Use Case is illustrated by the “Getting starting” code snippet.
Following code is an example with a C-model which has one input and
two output tensors that needs to be allocated by client App. The
activation buffer has been already allocated by tool using the CLI
option --allocate-activations
and is configured as part
of the stai_network_init()
API tasks
#include <stdio.h>
#include "network.h"
/* Network context allocation */
(STAI_NETWORK_CONTEXT_ALIGNMENT)
STAI_ALIGNEDstatic network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};
...
/* C-table to store the @ of the input buffer */
static float in_1_data[STAI_NETWORK_IN_1_SIZE];
static stai_ptr in_data[STAI_NETWORK_IN_NUM] = {
&in_1_data[0]
};
/* data buffer for the output buffers */
static float out_1_data[STAI_NETWORK_OUT_1_SIZE];
static float out_2_data[STAI_NETWORK_OUT_2_SIZE];
/* C-table to store the @ of the output buffers */
static stai_ptr out_data[STAI_NETWORK_OUT_NUM] = {
&out_1_data[0],
&out_2_data[0]
};
...
int aiInit(void) {
/* 1 - Initialize runtime */
();
stai_runtime_init...
/* 2 - Create and initialize network context */
...
(network_context);
stai_network_init
/* 2 - Update the AI input/output buffers */
(network_context, in_data, STAI_NETWORK_IN_NUM);
stai_network_set_inputs(network_context, out_data, STAI_NETWORK_OUT_NUM);
stai_network_set_outputs...
}
int aiDeinit(void) {
/* 1 - De-initialize network context */
(network_context);
stai_network_deinit
/* 2 - Deinitialize runtime */
();
stai_runtime_deinit...
}
void main_loop()
{
();
aiInit
while (1) {
/* 1 - Acquire, pre-process and fill the input buffers */
(in_data);
acquire_and_process_data
/* 2 - Call inference engine */
(network_context, STAI_MODE_SYNC);
stai_network_run
/* 3 - Post-process the predictions */
(out_data);
post_process}
();
aiDeinit}
stai_<name>_get_context_size()
stai_<name>_get_context_size(void); stai_size
This function can be used by the AI client application to get the size in bytes of the network model internal context to allocate or report it.
stai_<name>_get_error()
stai_<name>_get_error(stai_network* network); stai_return_code
This function can be used by the client application to retrieve
the first error reported during the execution of a
stai_<name>_xxx()
function.
- See
stai.h
C header to have the list of the returned error code (stai_return_code
).
Typical ST Edge AI error function handler (debug/log purpose)
#include "network.h"
...
void aiLogErr(const stai_return_code err)
{
("E: STAI error - code=0x%x\r\n", err);
printf}
Retrieve network and tensors information
Following code snippets show how to retrieve the network and tensors information from a `stai_network_info’ C-struct .
#include <stdio.h>
#include "network.h"
void dump_tensor_info(const stai_tensor* t) {
;
stai_size size_bytes;
stai_flags flags;
stai_format format;
stai_shape shape;
stai_array_f32 scale;
stai_array_s16 zeropointconst char* name;
(" name: %s\n", t->name);
printf(" bytes(%u) flags(0x%08x) format(0x%08x)\n", t->size_bytes, t->flags, t->format);
printf}
{
/* Initialize the network context */
...
/* Network context should have been already initialized */
;
stai_network_info info= stai_network_get_info(network_context, &info);
stai_return_code ret uint16_t i;
if (ret == STAI_SUCCESS) {
("Model name: %s\n", info.c_model_name);
printf
("Model inputs : %u\n", info.n_inputs);
printffor (i=0; i<info.n_inputs; i++) {
(&info.inputs[i]);
dump_tensor_info}
("Model outputs: %u\n", info.n_outputs);
printffor (i=0; i<info.n_outputs; i++) {
(&info.outputs[i]);
dump_tensor_info}
}
...
}
Base address of the IO buffers
The following code snippet illustrates the minimum instructions
required to retrieve the effective address of the buffer from the
activations buffer. If the input and/or output buffers are
not allocated in the activations buffer, NULL
is
returned, unless the client application has already set the input /
output buffers pointers using the stai_<name>_set[inputs|outputs]()
API. Note that the network context should have been already
initialized.
#include "network.h"
/* Network context allocation */
(STAI_NETWORK_CONTEXT_ALIGNMENT)
STAI_ALIGNEDstatic network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};
/* Buffer should be allocated by the application */
static uint8_t in_data_1[AI_NETWORK_IN_1_SIZE_BYTES];
...
{
[AI_NETWORK_IN_NUM];
stai_ptr inputs= 0;
stai_size n_inputs (network_context, inputs, &n_inputs);
stai_network_get_inputs...
}
float32 to 8b data type conversion
Following code snippet illustrates the float (float) to integer (int8_t/uint8_t) format conversion. Input buffer is used as destination buffer.
#include "network.h"
/* Network context allocation */
(STAI_NETWORK_CONTEXT_ALIGNMENT)
STAI_ALIGNEDstatic network network_context[STAI_NETWORK_CONTEXT_SIZE] = {0};
#define _MIN(x_, y_) \
( ((x_)<(y_)) ? (x_) : (y_) )
#define _MAX(x_, y_) \
( ((x_)>(y_)) ? (x_) : (y_) )
#define _CLAMP(x_, min_, max_, type_) \
(type_) (_MIN(_MAX(x_, min_), max_))
#define _ROUND(v_, type_) \
(type_) ( ((v_)<0) ? ((v_)-0.5f) : ((v_)+0.5f) )
const stai_tensor* get_input_desc(stai_network_info* info, const uint8_t idx)
{
if (info && (idx > 0) && (idx <= info->n_inputs)) {
return &info->inputs[idx - 1];
}
return NULL;
}
float input_f[STAI_NETWORK_IN_1_SIZE];
int8_t input_q[STAI_NETWORK_IN_1_SIZE]; /* or uint8_t */
;
stai_network_info net_info
{
();
stai_runtime_init(network_context);
stai_network_init(network_context, &net_info);
stai_network_get_info
const stai_tensor *input_1 = get_input_desc(&net_info, 1);
if (input_1 == NULL) {...}
const float scale = input_1->scale.data[0];
const int32_t zp = input_1->zeropoint.data[0];
= 1.0f / scale;
scale
/* Loop */
for (int i=0; i < STAI_NETWORK_IN_1_SIZE; i++)
{
const int32_t tmp_ = zp + _ROUND(input_f[i] * scale, int32_t);
/* for uint8_t */
[i] = _CLAMP(tmp_, 0, 255, uint8_t);
input_q/* for int8_t */
[i] = _CLAMP(tmp_, -128, 127, int8_t);
input_q}
...
...
(network_context);
stai_network_deinit();
stai_runtime_deinit}
8b to float32 data type conversion
Following code snippet illustrates the integer (int8_t/uint8_t) to float (float) format conversion. The output buffer is used as source buffer.
#include "network.h"
int8_t output_q[STAI_NETWORK_OUT_1_SIZE]; /* or uint8_t */
float output_f[STAI_NETWORK_OUT_1_SIZE];
;
stai_network_info net_info
const stai_tensor* get_output_desc(stai_network_info* info, const uint8_t idx)
{
if (info && (idx > 0) && (idx <= info->n_outputs)) {
return &info->outputs[idx - 1];
}
return NULL;
}
{
();
stai_runtime_init(network_context);
stai_network_init(network_context, &net_info);
stai_network_get_info
const stai_tensor* output_1 = get_output_desc(&net_info, 1);
if (output_1 == NULL) {...}
const float scale = output_1->scale.data[0];
const int32_t zp = output_1->zeropoint.data[0];
/* Loop */
for (int i=0; i<STAI_NETWORK_OUT_1_SIZE; i++)
{
[i] = scale * ((float)(output_q[i]) - zp);
output_f}
...
...
(network_context);
stai_network_deinit();
stai_runtime_deinit}
C-memory layouts
To store the elements of the different tensors, standard
C-array-of-array struct are used. For the float
and
integer
data-type, the associated c-type is used. For
the bool
type, value is stored as uint8_t
while for binary tensor, to optimize the size, values are packed on
the 32b words (see “c-layout of the s1
type” section for more details).
Note
Note that 'BHWC'
data format (or ‘channel-last’) is
used for illustration but for BCHW
data format (or
‘channel-first’), C-memory layout respects the same principle.
1d-array tensor
For a 1-D tensor, standard C-array type with the following memory layout is expected to handle the input and output tensors.
#include "network.h"
/* in network.h STAI_<NAME>_IN_1_SIZE is defined ( where e.g. IN_1_SIZE = H * W * C = C ) */
float xx_data[B * STAI_<NAME>_IN_1_SIZE]; /* n_batch = B, height = 1,
width = 1, channels = C */
or:
float xx_data[B][STAI_<NAME>_IN_1_SIZE];
2d-array tensor
For a 2-D tensor, standard C-array-of-array memory arrangement is used to handle the input and output tensors. 2-dim are mapped to the first two dimensions of the tensor in the original toolbox representation: e.g. H and C in Keras / Tensorflow.
#include "network.h"
/* STAI_<NAME>_IN_1_SIZE defined in network.h ( where e.g. IN_1_SIZE = H * W * C = H * C ) */
float xx_data[STAI_<NAME>_IN_1_SIZE]; /* n_batch = 1, height = H,
width = 1, channels = C */
/* STAI_<NAME>_IN_1_HEIGHT, STAI_<NAME>_IN_1_CHANNEL
defined in network.h */
float xx_data[STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_CHANNEL];
float xx_data[B * STAI_<NAME>_IN_1_SIZE_SIZE]; /* n_batch = B, height = H,
width = 1, channels = C */
float xx_data[B][STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_CHANNEL];
3d-array tensor
For a 3D-tensor, standard C-array-of-array-of-array memory arrangement is used to handle the input and output tensors.
#include "network.h"
/* STAI_<NAME>_IN_1_SIZE defined in network.h ( where e.g. IN_1_SIZE = H * W * C = H * W * C ) */
float xx_data[STAI_<NAME>_IN_1_SIZE]; /* n_batch = 1, height = H,
width = W, channels = C */
/* STAI_<NAME>_IN_1_HEIGHT, STAI_<NAME>_IN_1_WIDTH, STAI_<NAME>_IN_1_CHANNEL
defined in network.h */
float xx_data[STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_WIDTH][STAI_<NAME>_IN_1_CHANNEL];
float xx_data[B * STAI_<NAME>_IN_1_SIZE_SIZE]; /* n_batch = B, height = H,
width = W, channels = C */
float xx_data[B][STAI_<NAME>_IN_1_HEIGHT][STAI_<NAME>_IN_1_WIDTH][STAI_<NAME>_IN_1_CHANNEL];