How to use the AiRunner package
ST Edge AI Core Technology 2.2.0
r1.0
Overview
This article explains how to use the 'stm_ai_runner'
Python package, also known as 'ai_runner'
, to profile
and validate a deployed C-model. As illustrated in the following
figure, the model can be deployed either on a target device or on
the host. The AiRunner object provides a simple and unified
interface for inference and profiling, allowing users to inject
data, execute inference, and retrieve predictions.
The 'stm_ai_runner'
Python package is also
integrated into the ST Edge Core AI CLI (used by the ‘validate’
command). But it can also be used independently to extend the
default validation process. End-users (data scientists or ML/AI
designers) can update (with minor adaptations) its classical
validation Python scripts to validate the deployed model with the
real dataset and metrics.
Multiple back-ends are supported, in this article two main configurations are considered:
Execution on host. Through the
'generate'
command, the user has the possibility to create a shared library (or DLL, using the ‘–dll’ option) containing the specialized c-files. This shared library exports the embedded C-API functions (legacy API or st-ai API) which are bound in the Python environment to export a common API.Execution on a physical target. The specialized files are linked with a generic embedded test application (also called the aiValidation application). On the host side, a simple message-based protocol on top of a serial protocol exposes a set of services for discovering and using the deployed models.
Setting up a work environment
Following Python packages should be installed in a Python 3.x
environment to use the 'stm_ai_runner'
package it is
recommended to use a virtual environment.
protobuf<3.21
tqdm
colorama
pyserial
numpy
To be able to import the 'stm_ai_runner'
package,
set the 'PYTHONPATH'
environment variable:
export PYTHONPATH=$STEDGEAI_CORE_DIR/scripts/ai_runner:$PYTHONPATH
%STEDGEAI_CORE_DIR%
represents the root location where the ST Edge AI Core components are installed, typically in a path like"<tools_dir>/STEdgeAI/2.1/"
.
Tip
The stm_ai_runner
package communicates with the
board using a protocol based on the 'Nanopb'
module
version 0.3.x. 'Nanopb'
is a plain-C implementation of
Google’s Protocol Buffers data format. The
stm_ai_runner
package is fully compatible with protobuf
versions below 3.21. For more information, you can visit the Nanopb website. If a
more recent version of 'protobuf'
is required
and the protobuf package cannot be downgraded, the following
environment variable can be used:
export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
```
Generating the model for execution on host
To generate a model which can be executed on the host machine,
the ‘–dll’
option is used. By default, the shared library is generated in the
<workspace-directory-path>\inspector_network\workspace\lib\
folder.
-m <model_path> --target stm32 --c-api st-ai --dll
$ stedgeai generate ...
(6)
Generated files ------------------------------------------------------------------------------------------
<workspace-directory-path>\inspector_network\workspace\generated\network_data.h
<workspace-directory-path>\inspector_network\workspace\generated\network_data.c
<workspace-directory-path>\inspector_network\workspace\generated\network_details.h
<workspace-directory-path>\inspector_network\workspace\generated\network.h
<workspace-directory-path>\inspector_network\workspace\generated\network.c
<workspace-directory-path>\inspector_network\workspace\lib\libai_network.dll
<output-directory-path>\network_generate_report.txt Creating txt report file
Note
The specialized c-files which are used to generate the shared library are also generated in the same directory.
To check the generated shared library, the 'validate'
command can be used. with the -d file:st_ai_ws
option
(It indicates that the libai_network.dll
from the
st_ai_ws
folder should be used)
-m <model_path> --target stm32 --mode target -d file:st_ai_ws $ stedgeai validate
The 'checker.py'
script can be also used without
option:
$STEDGEAI_CORE_DIR/scripts/ai_runner/examples/checker.py $ python
Generating the model for execution on a physical target
To use the 'stm_ai_runner'
Python package with a
model excecuting on a physical target, it should be flashed with a
firmware which includes the generic aiValidation built-in
application and the specialized C-files.
For the 'stm32n6'
target, which requires NPU
support, a quick and typical process is described in the article
titled “How to
evaluate a model on an STM32N6 board. For the others
'stm32xx'
targets, the X-CUBE-AI UI plug-in can be
leveraged, as detailed in the “Getting
started with X-CUBE-AI Expansion Package for Artificial Intelligence
(AI)” user manual.
To check the deployed c-model on the target, 'validate'
command can be used with the '-d/--desc serial'
option.
-m <model_path> --target stm32 --mode target -d serial $ stedgeai validate
The 'checker.py'
script can be also used with
'-d/--desc serial'
option:
$STEDGEAI_CORE_DIR/scripts/ai_runner/examples/checker.py -d serial $ python
By default for STM32N6 board, '-d serial:921600'
option should be used.
Getting started - Minimal script
Following code shows a minimal script to perform a model inference (with random input data) running on a physical target and to display the profiling information.
import sys
import argparse
from stm_ai_runner import AiRunner
= 'serial'
desc
# create AiRunner object
= AiRunner()
runner # connection
connect(desc)
runner.# display and retrieve model info (optional)
runner.summary()dict = runner.get_info()
model_info: list[dict] = runner.get_input_infos() # = model_info['inputs']
input_details: list[dict] = runner.get_output_infos() # = model_info['outputs']
output_details: # generate the random input data
list[np.ndarray] = runner.generate_rnd_inputs(batch_size=2)
inputs: # perform the inference
= AiRunner.Mode.PER_LAYER
mode: AiRunner.Mode = runner.invoke(inputs, mode=mode)
outputs, profiler # display the profiling info
runner.print_profiling(inputs, profiler, outputs)# disconnect
runner.disconnect()
This excerpt is part of the
'$STEDGEAI_CORE_DIR/scripts/ai_runner/examples'
folder
(see 'minimal.py'
and 'checker.py'
files).
AiRunner API
The '$STEDGEAI_CORE_DIR/scripts/ai_runner/examples'
folder provides different simple scripts using the AiRunner API.
Connection
connect()
The 'connect()'
method allows to bind a AiRunner
object to a given ST AI runtime. The 'desc'
parameter
is used to specify the used back-end or driver.
import sys
from stm_ai_runner import AiRunner
=...
desc
= AiRunner()
runner connect(desc)
runner.
if not runner.is_connected:
print('No c-model available, use the --desc/-d option to specifiy a valid path/descriptor')
print(f' {runner.get_error()}')
1)
sys.exit( ...
‘desc’ parameter
Format (str type):
<protocol/backend>[:<parameters>]
The first part of the descriptor defines the used back-end or
driver to perform the connection with a given runtime embedding one
or more deployed models. The definition of the
'parameters'
field is driver-specific.
back-end/driver | description |
---|---|
'lib:parameter' |
used to bind a shared library
exporting the embedded c-api. The 'parameter' argument
indicates the full file path or the root folder containing the
shared library (ex: 'lib:./my_model' ). Note that the
'lib:' field can be omitted if a valid folder or valid
file is provided. |
'serial[:parameter]' |
used to open a connection with a physical target through a serial link. The target should be flashed with a specific build-in profiling application (aiValidation application) embedding the deployed models. |
Parameters for the serial driver
Format (str type):
:<com port>[:<baud-rate>]
The parameter argument is optional, by default, an autodetection mechanism is applied to discover a connected board at 115200 or 921600 for ISPU. The baud rate should be aligned with the value defined in the firmware.
set the baud rate to 921600
$ stedgeai ... -d serial:921600
set a specific COM port
$ stedgeai ... -d serial:COM4 # Windows environment $ stedgeai ... -d /dev/ttyACM0 # Linux-like environment
set the COM port and the baud rate
$ stedgeai ... -d serial:COM4:921600
Typical connection errors
No shared library found.
desc
designates a folder without valid shared library file./unsupported "st_ai_ws/:" descriptor invalid
Provided generated shared library is invalid. Error message indicates that the shared library has been generated without weights. This can appear when the
validate
command has been performed in the default./st_ai_ws/
directory.E801(HwIOError): No weights are available (1549912 bytes expected)
The STM32 board is not connected. autodetect mode.
E801(HwIOError): No SERIAL COM port detected (STM board is not connected!)
COM port is already opened by another application (like TeraTerm(r) for example)
E801(HwIOError): could not open port 'COM6': PermissionError(13, 'Access is denied.', None, 5)
STM32 board is not flashed with a valid aiValidation firmware.
E801(HwIOError): Invalid firmware - COM6:115200
names
Multiple models can be deployed in a given AI runtime environment
running on board. Each model should be deployed with a specific
‘c-name’ which is used as a selector. The 'names'
method can be used to have the list of available c-models.
list[str] = runner.names
available_models: print(available_models)
# ['network0', 'network1', ...]
AiRunnerSession()
To facilitate the use of a specific named c-model, the
'session(name: Optional[str])'
method returns a
dedicated handler object called AiRunnerSession. This object
provides the same methods as the AiRunner object for using a
deployed model.
= AiRunner()
runner connect(desc)
runner.
...= runner.session('network_2') session: AiRunnerSession
Model information
get_info()
The 'get_info(name: Optional[str] = None)'
method
allows to retrieve the detailed information (dict form) for a given
model.
dict = runner.get_info() # equivalent to runner.get_info(available_models[0]) model_info:
Model dict
key | type | description |
---|---|---|
‘version’ | tuple | version of the dict -
(2, 0) |
‘name’ | str | c-name of the model
(--name option of the code generator) |
‘compile_datetime’ | str | date-time when the model has been compiled |
‘n_nodes’ | int | number of deployed c-nodes to implement the model |
‘inputs’ | list[dict] | input tensor descriptions |
‘outputs’ | list[dict] | output tensor descriptions |
‘hash’ | Optional[str] | hash (md5) of the original model file |
‘weights’ | Optional[int, list[int]] | accumulated size in bytes of the weights/params buffers |
‘activations’ | Optional[int, list[int]] | accumulated size in bytes of the activations buffer |
‘macc’ | Optional[int] | equivalent number of macc |
‘rt’ | str | short description of the used AI runtime API |
‘runtime’ | dict | main properties of the AI runtime/environment running the deployed model |
‘device’ | dict | main properties of the device supporting the AI runtime |
Tensor dict
key | type | description |
---|---|---|
‘name’ | str | name of tensor (c-string) |
‘shape’ | tuple | shape |
‘type’ | np.dtype | data type |
‘scale’ | Optional[np.float32] | if quantized, scale value |
‘zero_point’ | Optional[np.int32] | if quantized, zero-point value |
AI runtime/environment dict
key | type | description |
---|---|---|
‘protocol’ | str | description of the used back-end/driver |
‘name’ | str | short description of the used AI runtime API |
‘tools_version’ | tuple | version of the tools used to deploy the model (STEdgeAI core version) |
‘rt_lib_desc’ | str | description of the AI used AI runtime libraries |
‘version’ | tuple | version of the AI used AI runtime libraries |
‘capabilities’ | list[AiRunner.Caps] | capabilities of the AI runtime |
capability | description |
---|---|
AiRunner.Caps.IO_ONLY | minimal capability (mandatory) allowing to inject the data and to retrieve the predictions |
AiRunner.Caps.PER_LAYER | capability to report the intermediate tensor information without the data |
AiRunner.Caps.PER_LAYER_WITH_DATA | capability to report the intermediate tensor informations including the data |
Device dict
key | type | description |
---|---|---|
‘dev_type’ | str | target name |
‘desc’ | str | short description of the device including the main frequencies |
‘dev_id’ | Optional[str] | device id (target specific) |
‘system’ | str | short description of the platform |
‘sys_clock’ | Optional[int] | frequency (Hz) of the MCU |
‘bus_clock’ | Optional[int] | frequency (Hz) of the main system bus |
‘attrs’ | Optional[list[str]] | attributes (target specific) |
get_input_infos(), get_output_infos()
The 'get_input_infos(name: Optional[str] = None)'
and 'get_output_infos(name: Optional[str] = None)'
methods allow to retrieve the detailed information of the
input/output tensors (dict
form).
list[dict] = runner.get_input_infos() # equivalent to runner.get_input_infos(available_models[0])
model_inputs: list[dict] = runner.get_output_infos() # equivalent to runner.get_output_infos(available_models[0]) model_outputs:
Perform the inference
invoke()
The
'invoke(inputs: Union[np.ndarray, List[np.ndarray]])'
method allows performing inference with the input data. It returns a
tuple containing the predictions ('outputs'
) and a
Python dictionary with the profiling information
('profiler'
).
# perform the inference
= AiRunner.Mode.PER_LAYER
mode: AiRunner.Mode = runner.invoke(inputs, mode=mode) outputs, profiler
‘mode’ parameter
The 'mode'
parameter consists of OR-ed flags that
allow setting the mode of the AI runtime. It is dependent on the
returned capabilities.
mode | description |
---|---|
AiRunner.Mode.IO_ONLY | out-of-the-box execution, only the predictions are dumped, intermediate information are not reported |
AiRunner.Mode.PER_LAYER | descriptions of the intermediate nodes are reported without the data |
AiRunner.Mode.PER_LAYER_WITH_DATA | when supported the intermediate data are also dumped |
AiRunner.Mode.PERF_ONLY | no input data are sent to the target, and the results are not dumped |
Profiling dict
key | type | description |
---|---|---|
‘info’ | dict | model/AI runtime information (dict form), refer to ‘Model dict’ section |
‘mode’ | ‘AiRunner.Mode’ | used mode |
‘c_durations’ | List[float] | List with inference time (ms) by sample |
‘c_nodes’ | Optional[List[dict]] | List with the profiled c-node
information. One entry by node, PER_LAYER or
PER_LAYER_WITH_DATA mode must be used |
‘debug’ | str | c-name of the model
(--name option of the code generator) |
Warning
The returned profiling information depends on the AI runtime environment and/or target. For example, if the deployed model is executed on the host, information about the cycles is not returned. This is because such information is not relevant, as the implementation of the kernels is not optimized for the host/development machine.
C-node dict
key | type | description |
---|---|---|
‘name’ | str | name of the node |
‘m_id’ | int | {optional} associated index of layer from the original model |
‘layer_type’ | int | id type of the node. Definition is AI runtime specific |
‘layer_desc’ | str | short description of the node |
‘type’ | List[np.ndarray] | data type of the associated output tensors |
‘shape’ | List[Tuple[int]] | shape of the associated output tensors |
‘scale’ | Optional[List[np.float32]] | if quantized, scale value |
‘zero_point’ | Optional[List[np.int32]] | if quantized, zero-point value |
‘c_durations’ | List[float] | Inference time (ms) of the node by sample |
‘clks’ | Optional[List[Union[int, list[int]]] | Number of MCU/CPU clocks to execute the node, AI runtime/target dependent |
‘data’ | Optional[List[np.ndarray]] | when available (capability AiRunner.Caps.PER_LAYER_WITH_DATA ),
dumped data of the associated output tensors after each node
execution |
Services
summary()
The 'summary(name: Optional[str] = None)'
method
displays a summary of the information provided by the 'get_info(name: Optional[str] = None)'
method.
runner.summary()
generate_rnd_inputs()
The
'AiRunner.generate_rnd_inputs(name: Optional[str])'
method is a helper service allowing to generate the input data for a
given model. val
parameter is used to set the range of
the data which are uniformly distributed over a specific interval
[low, high[
. Default is [-1.0, 1.0[
for
the floating-point type.
= runner.generate_rnd_inputs(name='network', batch_size=2) inputs: Union[np.ndarray, List[np.ndarray]]
print_profiling()
The 'print_profiling(inputs, profiler, outputs)'
method displays a summary of the profiling information returned by
the ‘invoke()’ method.
# perform the inference
= runner.invoke(inputs)
outputs, profiler # display the profiling info
runner.print_profiling(inputs, profiler, outputs)
Examples
Location:
%STEDGEAI_CORE_DIR%/script/ai_runner/example/
checker.py
provides an example to use theai_runner
module including the profiling information# Try to load the shared library located in the default location: ./st_ai_ws. # It displays a summary and performs two inferences with the random data. .py $ python checker # As previously, but it performs a connection with a STM32 board (auto-detect mode) .py -d serial $ python checker # Set the expected COM port and baudrate .py -d serial:COM6:115200 $ python checker
tflite_test.py
provides a typical example to compare the outputs of the generated C-model against the predictions from thetf.lite.Interpreter
.mnist
provides a complete example with two scripts allowing to train a model (train.py
) and to test (test.py
) with the generated c-model.