Platform Observer API (legacy)
ST Edge AI Core Technology 2.2.0
r1.2
Purpose
For advanced run-time, debug or profiling purposes, an AI client
application can register a call-back function to be notified before
or/end after the execution of a c-node. As detailed in the “C-graph
description” section, each node is identified by its executing
index: 'c_id'
. The call-back can be used to measure the
execution time or/and to dump the intermediate values.
Note
During the execution of the
ai_<network>_run()
function, the registered
callback is executed synchronously in the context of the caller.
Use cases
User call-back registration for profiling
Minimal
code snippet is updated to register a basic call-back function
that logs the number of used core cycles after each executing of a
node (More advanced implementation can be found in
'aiSystemPerformance.c'
file).
#include "ai_platform_interface.h"
...
/*
* Observer initialization
*/
/* Minimal ctx to store the timestamp (before execution) */
struct u_observer_ctx {
uint64_t ts;
uint32_t n_events;
};
struct u_observer_ctx u_observer_ctx;
static ai_u32 u_observer_cb(const ai_handle cookie,
const ai_u32 flags,
const ai_observer_node *node) {
uint64_t ts = dwtGetCycles(); /* time stamp entry */
struct u_observer_ctx *ctx = (u_observer_ctx *)cookie;
if (flags & AI_OBSERVER_POST_EVT) {
("%d - cpu cycles: %lld\r\n", node->c_idx, ts - ctx->ts);
printf->n_events++;
ctx}
->ts = dwtGetCycles(); /* time stamp exit */
ctxreturn 0;
}
/* Register a call-back to be notified before
and after each executing of a c-node */
int aiObserverSetup() {
if (!ai_platform_observer_register(network,
, &u_observer_ctx,
u_observer_cb| AI_OBSERVER_POST_EVT)) {
AI_OBSERVER_PRE_EVT = ai_network_get_error(network);
err ("E: AI ai_platform_observer_register error - type=%d code=%d\r\n", err.type, err.code);
printfreturn -1;
}
return 0;
}
Node-per-node inspection
The ai_platform_observer_node_info()
function can be
used to pass through the executing C-graph structure retrieving the
tensor attributes node-per-node. A set of helper macros
(AI_TENSOR_XXX
from
ai_platform_interface.h
file) should be used to
retrieve or to manipulate the returned tensor object:
t
.
#include "ai_platform_interface.h"
#include "core_common.h"
#include "core_private.h"
{
;
ai_observer_node node_info*tl;
ai_tensor_list
.c_idx = 0; /* starting with the first node */
node_infowhile (ai_platform_observer_node_info(network, &node_info)) {
/* Check if the node is a "Time Distributed" operator. In this
* case, weight/bias tensors are provided through the inner object
* - node_info.inner_tensors != NULL condition can be also used
*/
const ai_bool is_time_dist = (node_info.type & 0x8000 != 0);
.type &= 0x7FFF;
node_info/* Retrieve the list of the input tensors */
= GET_TENSOR_LIST_IN(node_info.tensors);
tl if (tl) {
(i, t, tl) {
AI_FOR_EACH_TENSOR_LIST_DO...
}
}
/* Retrieve the list of the output tensors */
= GET_TENSOR_LIST_OUT(node_info.tensors);
tl if (tl) {
(i, t, tl) {
AI_FOR_EACH_TENSOR_LIST_DO...
}
}
/* Retrieve the list of the weight/bias tensors */
if (is_time_dist)
= GET_TENSOR_LIST_WEIGTHS(node_info.inner_tensors);
tl else
= GET_TENSOR_LIST_WEIGTHS(node_info.tensors);
tl if (tl) {
(i, t, tl) {
AI_FOR_EACH_TENSOR_LIST_DO...
}
}
/* Retrieve the list of the scratch tensors */
if (is_time_dist)
= GET_TENSOR_LIST_SCRATCH(node_info.inner_tensors);
tl else
= GET_TENSOR_LIST_SCRATCH(node_info.tensors);
tl if (tl) {
(i, t, tl) {
AI_FOR_EACH_TENSOR_LIST_DO...
}
}
.c_idx++;
node_info} /* end of the while loop */
...
}
macro | description |
---|---|
AI_TENSOR_ARRAY_BYTE_SIZE(t) |
returns the size in byte of the data buffer. |
AI_TENSOR_ARRAY_GET_DATA_ADDR(t) |
returns the effective address of the data buffer. |
AI_TENSOR_ARRAY_UPDATE_DATA_ADDR(t, addr) |
set a new effective address. It should be 4-bytes aligned. Previous address is forgotten and not saved (see next section). |
Warning
ai_platform_observer_node_info()
should be called
with an initialized instance to be sure to have a complete and
ready-to-use initialization of the internal runtime data structure
(in particular the arrays objects which handle the data of the
tensors).
Copy-before-run use-case
Kernels from the network runtime library are designed to take
account flexible data placement, thanks to the usage of the scratch
buffers or stack-based technics. After profiling session, and if a
static placement approach (based on the
['--split-weights'
][API_split_weights] option) is not
sufficient or adapted, it is also possible to improve the inference
time, by copy-before-run the critical weights/bias data
buffer in a low latency memory.
Following code snippet illustrates the usage of a software
“cache” memory to store the weights/bias of a specific critical
layer before to call the ai_<name>_run()
function. Particular compiler directive (tool-chain dependent) can
be used to place _w_cache
object.
#include <string.h>
#include "ai_platform_interface.h"
#define ALIGN_UP(num, align) \
(((num) + ((align) - 1)) & ~((align) - 1))
(4)
AI_ALIGNstatic ai_u8 _w_cache[XXX]; /* reserve buffer to cache the weights */
int aiCacheWeights(void) {
;
ai_observer_node node_info.c_idx = ID; /* index of the critical node */
node_infoif (ai_platform_observer_node_info(network, &node_info)) {
*tl;
ai_tensor_list = GET_TENSOR_LIST_WEIGTHS(node_info.tensors);
tl uintptr_t dst_addr = (uintptr_t)&_w_cache[0];
(i, t, tl) {
AI_FOR_EACH_TENSOR_LIST_DO/* Retrieve the @/size of the data */
const uintptr_t src_addr = (uintptr_t)AI_TENSOR_ARRAY_GET_DATA_ADDR(t);
const ai_size sz = AI_TENSOR_ARRAY_BYTE_SIZE(t);
/* Copy the dta tensor in the SW cache */
(dst_addr, src_addr, sz);
memcpy/* set the new effective address */
(t, dst_addr);
AI_TENSOR_ARRAY_UPDATE_DATA_ADDR+= ALIGN_UP(sz, 4);
dst_addr }
}
return 0;
}
Dumping intermediate output
Following code snippet illustrates a simple call-back to dump an
output of a given internal layer C_ID
. Internal tensor
description is converted to a ai_buffer
’-type data.
#include "ai_platform_interface.h"
...
#define C_ID (12) /* c-id of the operator which must be dumped */
static ai_u32 u_observer_cb(const ai_handle cookie,
const ai_u32 flags,
const ai_observer_node *node) {
*tl;
ai_tensor_list
if (node->c_idx == C_ID) {
= GET_TENSOR_LIST_OUT(node_info.tensors);
tl (i, t, tl) {
AI_FOR_EACH_TENSOR_LIST_DO= AI_TENSOR_INTEGER_GET_SCALE(t, 0);
ai_float scale = 0;
ai_i32 zero_point
if (AI_TENSOR_FMT_GET_SIGN(t))
= AI_TENSOR_INTEGER_GET_ZEROPOINT_I8(t, 0);
zero_point else
= AI_TENSOR_INTEGER_GET_ZEROPOINT_U8(t, 0);
zero_point
const ai_buffer_format fmt = AI_TENSOR_GET_FMT(t);
const ai_i32 shape_size_ = AI_SHAPE_SIZE(AI_TENSOR_SHAPE(t));
const ai_i32 b_ = AI_SHAPE_IN_CH(AI_TENSOR_SHAPE(t));
const ai_i32 h_ = AI_SHAPE_H(AI_TENSOR_SHAPE(t));
const ai_i32 w_ = AI_SHAPE_W(AI_TENSOR_SHAPE(t));
const ai_i32 ch_ = AI_SHAPE_CH(AI_TENSOR_SHAPE(t));
/* new dimension if available : shape_size_ >= 5 else 1 is returned */
const ai_i32 d_ = AI_SHAPE_D(AI_TENSOR_SHAPE(t));
const ai_i32 e_ = AI_SHAPE_E(AI_TENSOR_SHAPE(t));
const ai_size size = AI_TENSOR_ARRAY_BYTE_SIZE(t);
const ai_handle data = AI_TENSOR_ARRAY_GET_DATA_ADDR(t);
...
/** @brief User code to dump the data */
...
}
}
}
return 0;
}
...
/* registered call-back is only raised for the POST event */
(network,
ai_platform_observer_register, &u_observer_ctx, AI_OBSERVER_POST_EVT))
u_observer_cb...
End-of-process input buffer notification
Following code snippet illustrates a simple call-back to notify
the user application when an input buffer is processed by the
C_ID
layer. This can be use-full to anticipate a HW
capture process (DMA-based) before the end of the inference.
Warning
The input buffer should be not allocated in the activations buffer else there is no guarantee that the memory region will be not used by the other operators before the end of the inference.
#include "ai_platform_interface.h"
...
#define C_ID (0) /* c-id of the operator which processes the input buffer */
static ai_u32 u_observer_cb(const ai_handle cookie,
const ai_u32 flags,
const ai_observer_node *node) {
if (node->c_idx == C_ID) {
/* start a new capture process to fill the input buffer before the end-of
inference */
...
}
return 0;
}
...
/* registered call-back is only raised for the POST event */
(network,
ai_platform_observer_register, &u_observer_ctx, AI_OBSERVER_POST_EVT))
u_observer_cb...
Platform Observer API
ai_observer_node
The ai_platform_observer_node_info()
function and
registered call-back function use the following
ai_observer_node
data structure to report the tensor
attributes for a given node: c_idx
.
/* @file ai_platform_interface.h */
typedef struct ai_observer_node_s {
; /*!< node index (position in the execution list) */
ai_u16 c_idx; /*!< node type info */
ai_u16 type; /*!< node id assigned by code generator to reference the model layer */
ai_u16 id; /*!< unused field for alignment */
ai_u16 unusedconst ai_tensor_chain* inner_tensors; /*!< pointer to the inner tensors if available */
const ai_tensor_chain* tensors; /*!< pointer to a 4 sized array */
} ai_observer_node;
field | description |
---|---|
c_idx | index of the associated c-node (also called c-id) |
type | define the type of the c-operator (see
layers_list.h file: 100XX values). |
id | index of the original operator from the imported model. |
tensors | entry point to retrieve the list of [I], [O], [W] and [S] tensors. |
inner_tensors | if the operator is a “Time Distributed” operator, [W] and [S] tensors are returned through this entry, else this field is NULL. |
If type & 0x8000 != 0
, the associated operator
is a “Time Distributed” operator and tensors
and
inner_tensors
fields should be used to retrieve all of
the tensors: [I], [O], [W] and [S] lists (see “Node-by-node inspection” section).
Warning
inner_tensors
field is always NULL and the most
significant bit of type
is not updated when the
call-back is called.
ai_platform_observer_node_info()
(
ai_bool ai_platform_observer_node_info, ai_observer_node *node_info); ai_handle network
This function populates the referenced ai_observer_node
structure to
retrieve the node and associated tensor attributes. Requested node
index is defined through the node_info.c_idx
field. If
the network
parameter is not a valid network instance
or the index is out-of-range, ai_false
is returned.
ai_platform_observer_register()
(
ai_bool ai_platform_observer_register,
ai_handle network,
ai_observer_node_cb cb,
ai_handle cookie);); ai_u32 flags
This function registers an user call-back function. Only one call-back can be registered a time for a given network instance.
cb
pointer of an user callback function (see “User call-back registration” code snippet)cookie
reference of an user context/object which is returned without modification.
flags
bit-wise mask to indicate the type of requested events.
flags | event type |
---|---|
AI_OBSERVER_INIT_EVT |
initialization (at the end of the call
of ai_<name>_init() ) |
AI_OBSERVER_PRE_EVT |
before the execution of the kernel
(during the call of ai_<name>_run() ) |
AI_OBSERVER_POST_EVT |
after the execution of the kernel
(during the call of ai_<name>_run() ) |
typedef ai_u32 (*ai_observer_node_cb)(const ai_handle cookie,
const ai_u32 flags,
const ai_observer_node *node)
When the call-back is called, the previous 'flags'
event types are extended with the following values:
flags | event type |
---|---|
AI_OBSERVER_FIRST_EVT |
event related to the first node. |
AI_OBSERVER_LAST_EVT |
event related to the last node. |
ai_platform_observer_unregister()
(ai_handle network,
ai_bool ai_platform_observer_unregister, ai_handle cookie); ai_observer_node_cb cb
This function un-registers the registered user call-back
function. The 'cb'
pointer used to register it should
be used.