2.2.0
Platform Observer API (legacy)


ST Edge AI Core

Platform Observer API (legacy)


ST Edge AI Core Technology 2.2.0



r1.2

Purpose

For advanced run-time, debug or profiling purposes, an AI client application can register a call-back function to be notified before or/end after the execution of a c-node. As detailed in the “C-graph description” section, each node is identified by its executing index: 'c_id'. The call-back can be used to measure the execution time or/and to dump the intermediate values.

Note

During the execution of the ai_<network>_run() function, the registered callback is executed synchronously in the context of the caller.

Use cases

User call-back registration for profiling

Minimal code snippet is updated to register a basic call-back function that logs the number of used core cycles after each executing of a node (More advanced implementation can be found in 'aiSystemPerformance.c' file).

#include "ai_platform_interface.h"
...
/*
 * Observer initialization
 */

/* Minimal ctx to store the timestamp (before execution) */
struct u_observer_ctx {
  uint64_t ts;
  uint32_t n_events;
};

struct u_observer_ctx u_observer_ctx;

static ai_u32 u_observer_cb(const ai_handle cookie,
    const ai_u32 flags,
    const ai_observer_node *node) {

  uint64_t ts = dwtGetCycles();  /* time stamp entry */
  struct u_observer_ctx *ctx = (u_observer_ctx *)cookie;

  if (flags & AI_OBSERVER_POST_EVT) {
    printf("%d - cpu cycles: %lld\r\n", node->c_idx, ts - ctx->ts);
    ctx->n_events++;
  }
  ctx->ts = dwtGetCycles(); /* time stamp exit */
  return 0;
}

/* Register a call-back to be notified before
   and after each executing of a c-node */
int aiObserverSetup() {

  if (!ai_platform_observer_register(network,
     u_observer_cb, &u_observer_ctx,
     AI_OBSERVER_PRE_EVT | AI_OBSERVER_POST_EVT)) {
    err = ai_network_get_error(network);
    printf("E: AI ai_platform_observer_register error - type=%d code=%d\r\n", err.type, err.code);
    return -1;
  }
  return 0;
}

Node-per-node inspection

The ai_platform_observer_node_info() function can be used to pass through the executing C-graph structure retrieving the tensor attributes node-per-node. A set of helper macros (AI_TENSOR_XXX from ai_platform_interface.h file) should be used to retrieve or to manipulate the returned tensor object: t.

#include "ai_platform_interface.h"
#include "core_common.h"
#include "core_private.h"

{
  ai_observer_node node_info;
  ai_tensor_list *tl;

  node_info.c_idx = 0; /* starting with the first node */
  while (ai_platform_observer_node_info(network, &node_info)) {
    /* Check if the node is a "Time Distributed" operator. In this
     * case, weight/bias tensors are provided through the inner object
     * - node_info.inner_tensors != NULL condition can be also used
     */
    const ai_bool is_time_dist = (node_info.type & 0x8000 != 0);
    node_info.type &= 0x7FFF;
    /* Retrieve the list of the input tensors */
    tl = GET_TENSOR_LIST_IN(node_info.tensors);
    if (tl) {
      AI_FOR_EACH_TENSOR_LIST_DO(i, t, tl) {
        ...
      }
    }
    /* Retrieve the list of the output tensors */
    tl = GET_TENSOR_LIST_OUT(node_info.tensors);
    if (tl) {
      AI_FOR_EACH_TENSOR_LIST_DO(i, t, tl) {
        ...
      }
    }
    /* Retrieve the list of the weight/bias tensors */
    if (is_time_dist)
      tl = GET_TENSOR_LIST_WEIGTHS(node_info.inner_tensors);
    else
      tl = GET_TENSOR_LIST_WEIGTHS(node_info.tensors);
    if (tl) {
      AI_FOR_EACH_TENSOR_LIST_DO(i, t, tl) {
        ...
      }
    }
    /* Retrieve the list of the scratch tensors */
    if (is_time_dist)
      tl = GET_TENSOR_LIST_SCRATCH(node_info.inner_tensors);
    else
      tl = GET_TENSOR_LIST_SCRATCH(node_info.tensors);
    if (tl) {
      AI_FOR_EACH_TENSOR_LIST_DO(i, t, tl) {
        ...
      }
    }
    node_info.c_idx++;
  } /* end of the while loop */
  ...
}
macro description
AI_TENSOR_ARRAY_BYTE_SIZE(t) returns the size in byte of the data buffer.
AI_TENSOR_ARRAY_GET_DATA_ADDR(t) returns the effective address of the data buffer.
AI_TENSOR_ARRAY_UPDATE_DATA_ADDR(t, addr) set a new effective address. It should be 4-bytes aligned. Previous address is forgotten and not saved (see next section).

Warning

ai_platform_observer_node_info() should be called with an initialized instance to be sure to have a complete and ready-to-use initialization of the internal runtime data structure (in particular the arrays objects which handle the data of the tensors).

Copy-before-run use-case

Kernels from the network runtime library are designed to take account flexible data placement, thanks to the usage of the scratch buffers or stack-based technics. After profiling session, and if a static placement approach (based on the ['--split-weights'][API_split_weights] option) is not sufficient or adapted, it is also possible to improve the inference time, by copy-before-run the critical weights/bias data buffer in a low latency memory.

Following code snippet illustrates the usage of a software “cache” memory to store the weights/bias of a specific critical layer before to call the ai_<name>_run() function. Particular compiler directive (tool-chain dependent) can be used to place _w_cache object.

#include <string.h>
#include "ai_platform_interface.h"

#define ALIGN_UP(num, align) \
    (((num) + ((align) - 1)) & ~((align) - 1))

AI_ALIGN(4)
static ai_u8 _w_cache[XXX]; /* reserve buffer to cache the weights */

int aiCacheWeights(void) {
  ai_observer_node node_info;
  node_info.c_idx = ID; /* index of the critical node */
  if (ai_platform_observer_node_info(network, &node_info)) {
    ai_tensor_list *tl;
    tl = GET_TENSOR_LIST_WEIGTHS(node_info.tensors);
    uintptr_t dst_addr = (uintptr_t)&_w_cache[0];
    AI_FOR_EACH_TENSOR_LIST_DO(i, t, tl) {
        /* Retrieve the @/size of the data */
        const uintptr_t src_addr = (uintptr_t)AI_TENSOR_ARRAY_GET_DATA_ADDR(t);
        const ai_size sz = AI_TENSOR_ARRAY_BYTE_SIZE(t);
        /* Copy the dta tensor in the SW cache */
        memcpy(dst_addr, src_addr, sz);
        /* set the new effective address */
        AI_TENSOR_ARRAY_UPDATE_DATA_ADDR(t, dst_addr);
        dst_addr += ALIGN_UP(sz, 4);
      }
  }
  return 0;
}

Dumping intermediate output

Following code snippet illustrates a simple call-back to dump an output of a given internal layer C_ID. Internal tensor description is converted to a ai_buffer’-type data.

#include "ai_platform_interface.h"
...

#define C_ID (12)  /* c-id of the operator which must be dumped */

static ai_u32 u_observer_cb(const ai_handle cookie,
    const ai_u32 flags,
    const ai_observer_node *node) {

    ai_tensor_list *tl;

    if (node->c_idx == C_ID) {
      tl = GET_TENSOR_LIST_OUT(node_info.tensors);
      AI_FOR_EACH_TENSOR_LIST_DO(i, t, tl) {
          ai_float scale = AI_TENSOR_INTEGER_GET_SCALE(t, 0);
          ai_i32 zero_point = 0;

          if (AI_TENSOR_FMT_GET_SIGN(t))
            zero_point = AI_TENSOR_INTEGER_GET_ZEROPOINT_I8(t, 0);
          else
            zero_point = AI_TENSOR_INTEGER_GET_ZEROPOINT_U8(t, 0);

          const ai_buffer_format fmt = AI_TENSOR_GET_FMT(t);
          const ai_i32 shape_size_ = AI_SHAPE_SIZE(AI_TENSOR_SHAPE(t));
          const ai_i32 b_ = AI_SHAPE_IN_CH(AI_TENSOR_SHAPE(t));
          const ai_i32 h_ = AI_SHAPE_H(AI_TENSOR_SHAPE(t));
          const ai_i32 w_ = AI_SHAPE_W(AI_TENSOR_SHAPE(t));
          const ai_i32 ch_ = AI_SHAPE_CH(AI_TENSOR_SHAPE(t));
          /* new dimension if available : shape_size_ >= 5 else 1 is returned */
          const ai_i32 d_ = AI_SHAPE_D(AI_TENSOR_SHAPE(t));
          const ai_i32 e_ = AI_SHAPE_E(AI_TENSOR_SHAPE(t));

          const ai_size size = AI_TENSOR_ARRAY_BYTE_SIZE(t);
          const ai_handle data = AI_TENSOR_ARRAY_GET_DATA_ADDR(t);
          ...
          /** @brief User code to dump the data */
          ...
        }
      }
    }

  return 0;
}

...
/* registered call-back is only raised for the POST event */
ai_platform_observer_register(network,
     u_observer_cb, &u_observer_ctx, AI_OBSERVER_POST_EVT))
...

End-of-process input buffer notification

Following code snippet illustrates a simple call-back to notify the user application when an input buffer is processed by the C_ID layer. This can be use-full to anticipate a HW capture process (DMA-based) before the end of the inference.

Warning

The input buffer should be not allocated in the activations buffer else there is no guarantee that the memory region will be not used by the other operators before the end of the inference.

#include "ai_platform_interface.h"
...

#define C_ID (0)  /* c-id of the operator which processes the input buffer */

static ai_u32 u_observer_cb(const ai_handle cookie,
    const ai_u32 flags,
    const ai_observer_node *node) {

    if (node->c_idx == C_ID) {
      /* start a new capture process to fill the input buffer before the end-of 
         inference */
        ...
    }

  return 0;
}

...
/* registered call-back is only raised for the POST event */
ai_platform_observer_register(network,
     u_observer_cb, &u_observer_ctx, AI_OBSERVER_POST_EVT))
...

Platform Observer API

ai_observer_node

The ai_platform_observer_node_info() function and registered call-back function use the following ai_observer_node data structure to report the tensor attributes for a given node: c_idx.

/* @file ai_platform_interface.h */

typedef struct ai_observer_node_s {
  ai_u16            c_idx;   /*!< node index (position in the execution list) */
  ai_u16            type;    /*!< node type info */
  ai_u16            id;      /*!< node id assigned by code generator to reference the model layer */
  ai_u16            unused;  /*!< unused field for alignment */
  const ai_tensor_chain* inner_tensors; /*!< pointer to the inner tensors if available */
  const ai_tensor_chain* tensors;       /*!< pointer to a 4 sized array */
} ai_observer_node;
field description
c_idx index of the associated c-node (also called c-id)
type define the type of the c-operator (see layers_list.h file: 100XX values).
id index of the original operator from the imported model.
tensors entry point to retrieve the list of [I], [O], [W] and [S] tensors.
inner_tensors if the operator is a “Time Distributed” operator, [W] and [S] tensors are returned through this entry, else this field is NULL.

If type & 0x8000 != 0, the associated operator is a “Time Distributed” operator and tensors and inner_tensors fields should be used to retrieve all of the tensors: [I], [O], [W] and [S] lists (see “Node-by-node inspection” section).

Warning

inner_tensors field is always NULL and the most significant bit of type is not updated when the call-back is called.

ai_platform_observer_node_info()

ai_bool ai_platform_observer_node_info(
    ai_handle network, ai_observer_node *node_info);

This function populates the referenced ai_observer_node structure to retrieve the node and associated tensor attributes. Requested node index is defined through the node_info.c_idx field. If the network parameter is not a valid network instance or the index is out-of-range, ai_false is returned.

ai_platform_observer_register()

ai_bool ai_platform_observer_register(
    ai_handle network,
    ai_observer_node_cb cb,
    ai_handle cookie,
    ai_u32 flags););

This function registers an user call-back function. Only one call-back can be registered a time for a given network instance.

  • cb pointer of an user callback function (see “User call-back registration” code snippet)
  • cookie reference of an user context/object which is returned without modification.
  • flags bit-wise mask to indicate the type of requested events.
flags event type
AI_OBSERVER_INIT_EVT initialization (at the end of the call of ai_<name>_init())
AI_OBSERVER_PRE_EVT before the execution of the kernel (during the call of ai_<name>_run())
AI_OBSERVER_POST_EVT after the execution of the kernel (during the call of ai_<name>_run())
typedef ai_u32 (*ai_observer_node_cb)(const ai_handle cookie,
    const ai_u32 flags,
    const ai_observer_node *node)

When the call-back is called, the previous 'flags' event types are extended with the following values:

flags event type
AI_OBSERVER_FIRST_EVT event related to the first node.
AI_OBSERVER_LAST_EVT event related to the last node.

ai_platform_observer_unregister()

ai_bool ai_platform_observer_unregister(ai_handle network,
    ai_observer_node_cb cb, ai_handle cookie);

This function un-registers the registered user call-back function. The 'cb' pointer used to register it should be used.