2.2.0
ST Neural-ART NPU - Weights encryption support


ST Edge AI Core

ST Neural-ART NPU - Weights encryption support


for STM32 target, based on ST Edge AI Core Technology 2.2.0



r1.0

Introduction

This article describes the low-level services and API that enable the NPU encryption/decryption unit to deploy encrypted weights. Encryption of the activations and/or isolation of the NPU stack, including the model itself, the management, and provisioning of the requested keys, is out of scope for this article.

Why encrypt weights?

Encrypting weights is a good practice if intellectual property protection is required. Indeed, the weights of a model are usually the outcome of a thorough and time-consuming training phase that leads to a fine-tuned model fitting perfectly the use-case addressed.

How the ST Neural-ART accelerator™ can help?

As described in the “RM0408 Reference Manual - STM32N647/657xx Arm®-based 32-bit MCUs,” the RIF infrastructure allows trust in the NPU subsystem. The stream engine units can encrypt/decrypt data on-the-fly during transfers, enabling encrypted weights to be stored in external memory with minimal system overhead. Encryption keys are stored within the bus interfaces by a trusted application.

Data encryption

Note

Note that there are two bus interfaces and that both of them can be configured with keys. The stream engines 0-4 use bus interface 0 (and the associated keys), the stream engines 5-9 and the epoch controller use bus interface 1 (and the associated keys).

When requested to do so, each stream engine can fetch an encrypted chunk of data from a memory, and decrypt it while retrieving it. This, in turn, allows the other units of the NPU to use decrypted data for computations required during an epoch. Protection of the decrypted data is ensured by design. There is no easy way (for example, no debug solutions for that) to retrieve decrypted data flowing in the internal structure of the NPU during computations. Decrypted data do not leave the internal structure of the NPU and are never accessible outside of the NPU unencrypted. The stream engines can also encrypt data when doing a memory transfer: it is therefore possible to read unencrypted data with one stream engine, and to encrypt on-the-fly with another stream engine, to write to another location in memory.

Role of the NPU complier

To support the encryption of the weights, the NPU compiler provides a specific option ('–-encrypt-weights') which generates the extra code needed to configure the stream engines to fetch the encrypted data and decrypt them on-the-fly for the processing units. Only three or four cycles of latency are added. With this option, all weights/parameters regions are considered encrypted by the NPU compiler. For the software epochs, as the weights cannot be decrypted by the MCU, extra epochs are generated to decrypt the used weights in a specific activation memory region, allowing direct access. This additional activation region is used temporarily during the processing time, increasing the requested memory size.

Software epoch encryption

Warning

The process of using extra epochs to decrypt a memory region for a software component can also be applied to certain parameters (excluding weights) needed to configure a specific processing unit. However, this decryption process is not optimized by the NPU compiler, resulting in additional epochs that can significantly impact inference time. To mitigate this issue, you can use the '--no-encrypt-acc-confs' option. This option keeps the parameters unencrypted, allowing the software component to access them directly and avoid any additional overhead.

Role of the application developer

At “installation” time, the integrator or application developer is in charge of encrypting the memory chunks associated with a weights/params region.
At runtime, when performing an inference, one is only responsible for setting the keys on the bus interface before performing the inference.

Encryption algorithm description and parameters

As stated above, the encryption algorithm is embedded in hardware in the NPU. Here is a short description of the parameters that shall be provided as inputs of the algorithm to perform an encryption.

Parameter Description
Data The data to be encrypted
Address The final address at which the data is stored
Encryption ID Encryption algorithm salt
Rounds Number of encryption rounds to be done (9 or 12)
Increment Number of frames to encrypt before incrementing the encryption ID (0 means no increment)
Key selection 2 keys are available on the bus interface, selects one out of the two

Two keys can be stored on the bus interface and each key is 128-bit long.

Important remarks

  • The encrypted value depends on the address, this means that other things being equal, encryption may not provide the same result at two different addresses (that is, it is not possible to move encrypted weights around in memory otherwise decryption fails).
  • Each parameter (except the address) should be kept as secret as possible. Leaking a parameter could make an attacker’s job easier.
  • The encryption algorithm is involutive: encrypting with the same parameters twice output the original data. The encryption process is the same as the decryption process.
  • In the current implementation of the embedded low-level API, only the key values can be made to vary, other parameters are set to hardcoded values for simplicity

Supported features and limitations

  • Only the weights/params can be encrypted, no support to encrypt/decrypt on-the-fly the activations (including the model it-self)
  • No support to encrypt the blobs or/and bitstream for the epoch controller
  • The same key should be used for the different bus interfaces
  • For latency constraint, it is recommended to use the '–-encrypt-weights' and '--no-encrypt-acc-confs' options, to limit the overhead due to additional epochs.

Encrypting weights

ST Edge AI Core CLI does not encrypt weights, encryption must be done using dedicated tools installed together with the CLI. A set of specific Python scripts is provided in the $STEDGEAI_CORE_DIR/scripts/N6_encrypt folder.

%STEDGEAI_CORE_DIR% represents the root location where the ST Edge AI Core components are installed, typically in a path like "<tools_dir>/STEdgeAI/2.1/".

Requirements

Apart from an STM32N6-based board (for example, an STM32N6570-DK board), to use these scripts Python 3.9+ and the following modules are required (the Python distribution of STEdgeAI complies with this):

python >= 3.9
pyserial >= 3.5
protobuf >= 3.20.3
tqdm >= 4.64

Limitations

The current tool only supports:

  • Encryption of one memory initializer
  • Encryption of memory initializer in the .raw format only
  • Encryption of data in the range [0x7000'0000, 0x7800'0000] (address of external flash on the STM32N6-DK board when memory-mapped)

Hands-on: Encryption

This section demonstrates how to adapt the flow presented in the getting started section.
It is a prerequisite to have understood this article, for the current one to be easy to understand.

The global flow is very similar to the original one:

  1. Generate
  2. Encrypt weights
  3. Load the model on board
  4. Validate

Generate, for encrypted inference

When generating code using encryption the --encrypt-weights option has to be added in the profile used during generation.

$> stedgeai generate -m MYMODEL.tflite --target stm32n6 --st-neural-art my_profile@my_profiles.json

The generation process produces (in the output directory) files that are needed for the encryption step:

  • c_info.json file representing some information about code generation done
  • *.raw files containing the weights to be programmed on board, unencrypted

Encrypt weights

For this step, it is required to have an STM32N6 board (for example, STM32N6-DK) connected to the computer.

$STEDGEAI_CORE_DIR/scripts/N6_encrypt folder contains:

  • A c subfolder including the embedded project used for encryption (delivered as source if ever needed),
  • A python folder including the scripts that should be used by users (also delivered as source if ever needed)
    • end_to_end_encrypt.py is the recommended script to use for easy encryption

Launching the tool encrypts a .raw file and replace the original file by the encrypted one (the original plain file is then available in the .unencrypted file created).
The encrypted file can then be used as-is in the next step using the n6_loader.py.

$> python end_to_end_encrypt.py --cubeide C:/ST/STM32CubeIDE_1.17.0/STM32CubeIDE --postprocess C:/stai/st_ai_output/network_c_info.json C:/stai/st_ai_output/network_atonbuf.xSPI2.raw

17:29:40.128 :: cubeIDE_toolbox.py :: INFO     :: Resetting the board
17:29:40.977 :: cubeIDE_toolbox.py :: INFO     :: Starting GDB server
17:29:41.132 :: cubeIDE_toolbox.py :: INFO     :: Starting GDB client
17:29:42.351 :: end_to_end_encrypt.py :: INFO     :: Waiting for the firmware to initialize
17:29:43.352 :: end_to_end_encrypt.py :: INFO     :: Starting encryption script
17:29:43.353 :: encrypt_neural_art.py :: INFO     :: Parsing c_info file
17:29:43.368 :: encrypt_neural_art.py :: INFO     :: Memory pool to encrypt found at address: 0x71000000 -- 20.142 kBytes to encrypt at offset 0
17:29:43.399 :: encrypt_neural_art.py :: INFO     :: Starting encryption
17:29:43.399 :: encrypt_neural_art.py :: INFO     :: Sending encryption params: keys = (MSB:0xaabbccddaabbccdd)(LSB:0xaabbccddaabbccdd) -- nb_rounds = 12
17:29:43.619 :: encrypt_neural_art.py :: INFO     :: Data transfer finished -- Took 0.197 seconds -- size = 20.142kB -- Encryption rate: 102.242kB/s
17:29:43.620 :: encrypt_neural_art.py :: INFO     :: Encrypted data injected into network_atonbuf.xSPI2_encrypted.raw
17:29:43.620 :: encrypt_neural_art.py :: INFO     :: Done
17:29:43.621 :: end_to_end_encrypt.py :: INFO     :: Postprocessing the files
17:29:43.621 :: end_to_end_encrypt.py :: INFO     :: Backup of original unencrypted weights: network_atonbuf.xSPI2.raw -> network_atonbuf.xSPI2.unencrypted
17:29:43.621 :: end_to_end_encrypt.py :: INFO     :: Replacing original file with encrypted weights: network_atonbuf.xSPI2_encrypted.raw -> network_atonbuf.xSPI2.raw
17:29:43.659 :: end_to_end_encrypt.py :: INFO     :: Done
end_to_end_encrypt.py details

end_to_end_encrypt.py command-line interface

usage: end_to_end_encrypt.py [-h] [-v] [-k KEYS KEYS] [-r NBROUNDS] [-p COMPORT] --cubeide CUBEIDE [--skip_debug] [--postprocess] [--flash] c_info raw_file

End-To-End encryption tool for weights using Neural Art

positional arguments:
  c_info                json file output of the compilation
  raw_file              memory-initializer file output of the compilation (.raw)

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Increase output verbosity (debug) (default: False)
  -k KEYS KEYS, --keys KEYS KEYS
                        Keys to use (MSB LSB) (default: [12302652059516914909, 12302652059516914909])
  -r NBROUNDS, --nbrounds NBROUNDS
                        Number of rounds (ignored for now) (default: 12)
  -p COMPORT, --comport COMPORT
                        COM-port name to be used for transmitting data to STLink. auto tries to connect to the first "STLink" found. (default: auto)
  --cubeide CUBEIDE     Path to cubeIDE install dir (default: None)
  --skip_debug          Skip loading the encryption tool elf file to the board (default: False)
  --postprocess         Postprocess the files to use them with the n6_loader script (default: False)
  --flash               After postprocessing, flash the encrypted weights to the board (default: False)

Positional arguments shall be provided at all costs. Those are some of the outputs of the generate step above.

  • *_c_info.json: contains information about memory pools and possible encryption to be considered. The full path to this file shall be provided.
  • .raw files: the one containing the weights (that should be encrypted) shall be given to the tool. The full path to this file shall be provided.

Two optional arguments should also be provided in most cases:

  • --cubeide shall be provided to allow programming the encryption firmware to the board. The full path to CubeIDE (path ending with STM32CubeIDE) should be provided.
  • --postprocess shall be provided to allow compatibility of the outputs with the next n6_loader.py call.

The remaining arguments can be provided (or not):

  • --help shows the usage information provided above
  • --verbose makes the tool verbose, useful for debug.
  • --skip_debug can be used to ignore the step where the encryption firmware is programmed on the board (cubeide is not required if this option is used)
  • --flash can be used to automatically flash the encrypted weights on board (not used if n6_loader.py is to be used afterwards, but could be useful depending on the use case)
  • --comport can be used to specify the communication port to use (useless if only one STLINK is connected to the computer)

Finally, it is possible to change the encryption parameters:

  • --keys KEYS KEYS to change the keys to be set on bus interfaces for encryption. The keys are 128-bit values, two half-keys shall be provided (MSB/LSB). The default values are 0xaabbccddaabbccdd, 0xaabbccddaabbccdd, which are the keys set by default in the NPU_Validation_project` meaning this argument is not needed if the goal is to do a “validation on target”.

  • --nbrounds NBROUNDS (ignored for the current version) can be used to change the number of rounds done in the encryption algorithm.

end_to_end_encrypt.py function

This script will:

  • Discover tools provided with cubeIDE (gdb-server, gdb, cubeprogrammer)
  • Using gdb-server and gdb, load the .elf file provided in the c/ directory (encryption firmware)
  • Parse the *c_info.json file to find the first memory-pool to be encrypted in the memory range [0x7000'0000, 0x7800'0000]
    • Within this memory pool, find where is the data to be encrypted
  • Send encryption parameters to the encryption firmware (this will set the keys used for encryption during this session)
  • Split the data in the .raw file provided in chunks of 4 kbyte from the address found above
  • Send each chunk along with its final address on target to the encryption firmware
  • (The encryption firmware receives the chunk, perform the encryption, sends it back to the Python script through UART)
  • Receive the encrypted chunk
  • Rebuild the whole encrypted file
    • Move the initial .raw unencrypted file to .unencrypted
    • Store the encrypted file to the initial .raw file
  • if postprocess option is passed, change timestamps for all the files generated by the generation phase to allow the n6_loader to function properly
  • if flash option is passed, use cube programmer to flash the encrypted weights on board.

Load the model on board

After encrypting the weights (do not forget to use --postprocess in the script above), the standard flow described in the getting started applies:

$> python n6_loader.py --n6-loader-config config_n6l.json

Running an inference

Since the NPU_Validation_project is “encryption-ready” (using default bus interface keys), it is now possible to perform a validation on target:

$> stedgeai validate –m MYMODEL.tflite --target stm32n6 --mode target --desc serial:921600 --val-json st_ai_output/network_c_info.json

LL ATON runtime API extension

Most functions concerned by encryption are defined in ll_aton_cipher.c.

Encrypted transfers during the inference

When using the ST Neural-ART compiler with the --encrypt-weights option, the generated "network.c" file most likely contains calls to the LL_Streng_WeightEncryptionInit() function. These calls, automatically generated by the NPU compiler, are inserted as requested after configuring the stream engine (also known as DMA) transfer to enable an encrypted transfer (ciphertext).

In the current version, the implementation of this function is lightweight and contains hard-coded values for the encrypted transfer:

  • Encryption ID is set to 0
  • Encryption rounds are set to 12
  • The key selected is the one with the index 0
  • The encryption ID increment is set to 0 (no increment)

The transfers that are overloaded with this “decryption” decorator are then done using the key with the index 0. Such a key must be programmed on the bus interfaces, using the following function:

LL_Busif_SetKeys ( int bus_interface_no , int key_id , uin64_t busif_lsb_key , uint64_t busif_msb_key );

The bus interfaces shall be configured before launching an inference for the encrypted transfers to be done with the correct keys. By default, in the NPU_Validation project, the default keys used are as follows:

#define BUSIF_LSB_KEY      0xAABBCCDDAABBCCDD
#define BUSIF_MSB_KEY      0xAABBCCDDAABBCCDD

// Set key at index 0 (the only one used by LL_Streng_WeightEncryptionInit)
LL_Busif_SetKeys ( 0 , 0 , BUSIF_LSB_KEY , BUSIF_MSB_KEY );     // Set key on busif 0: for STRENG 0-4
LL_Busif_SetKeys ( 1 , 0 , BUSIF_LSB_KEY , BUSIF_MSB_KEY );     // Set key on busif 1: for STRENG 5-9+Epoch controller

Encrypted transfers will then use the key stored on the interface. It is a good practice to use the same key on both bus interfaces otherwise decrypting transfers done with different stream engines will not work. Nonencrypted transfers will just pass through the interface without using the encryption key (the configuration of the bus interface keys does not prevent doing plain, unencrypted, transfers).

Encrypting manually a buffer

Note

Since the address of the data is an input of the encryption algorithm, when doing a transfer to encrypt a buffer, the transfer shall be done towards (or from) the final address where this buffer will be stored at.

Easy case

For most cases (except when dealing with weights stored in flash – see below), doing a data transfer to encrypt a buffer is straightforward.
ll_aton_cipher provides int LL_DmaCypherInit(LL_Cypher_InitTypeDef *cypherInfo) that can be used to do a stream engine transfer to encrypt data.

The argument of this function is of type LL_Cypher_InitTypeDef:

typedef struct {
    uint32_t srcAdd;                       /**< Transfer source address */
    uint32_t dstAdd;                       /**< Transfer destination address */
    uint32_t len;                          /**< Transfer size */
    CypherCacheSourceMask cypherCacheMask; /**< Cache usage mask:
                                            *     0-no cache
                                            *     1-cache source
                                            *     2-cache destination */
    CypherEnableMask cypherEnableMask;     /**< Cyphering channel mask:
                                            *     0-no cypher
                                            *     1-cypher source
                                            *     2-cypher destination */
    uint64_t busIfKeyLsb;                  /**< Bus interface LSB Key */
    uint64_t busIfKeyMsb;                  /**< Bus interface MSB Key */
  } LL_Cypher_InitTypeDef;
  • srcAdd, dstAdd and len are used respectively for the source address, the destination address and the transfer size.
  • cypherEnableMask can be used to activate cyphering: the address used for encrypting weights can be either the source or the destination address depending on the use-case.
  • cypherCacheMask should not be used for the standard case (not triggering encryption of external-flash-bound weights).

The last two fields contain the bus interface key to use: 2x64-bits = 128 bits. This function returns when the data transfer is over.

Tricky case: Encrypting for external flash

Encrypting to external flash can be tricky because it is not possible for the stream engines to write directly to external flash. Capability to write in an external FLASH through the XSPI interface in memory-mapped mode is not guarantee.

One solution is to store the plaintext to external FLASH at the final address and encrypts it using the source address as a parameter TO a buffer in internal RAM. The resulting buffer stored in RAM could then be written to flash by the CPU, at the final address. The drawback of this solution is that the unencrypted data is stored in the FLASH for a given period.

Advanced trick: not using the flash memory at all

If it is required (or preferred) not to use the external flash at all, it is also possible to do the following (which is what is done in the encryption firmware provided). Assume the buffer to be encrypted is in RAM, and should be placed in the final application, encrypted in external flash. It is then possible to do an encryption transfer using the final destination address as the parameter for encryption and to force the stream engine to write in the AXICACHE cache. This is done by using the cypherCacheMask and set it to use cache for the final destination address, that is, the write is routed to cache instead of external flash.

This overcomes the need to access to external flash: the encrypted buffer is stored in AXICACHE lines.
This can then be retrieved by doing a simple stream-engine transfer (non encrypting) that will get the encrypted AXICACHE lines (see dma_memcpy_with_streng in Weights_encryption/FSBL/Core/Src/encrypt.c), by routing the read to the AXICACHE instead of reading from the external flash.

To use this trick:

  • The amount of encryption to be done at once shall not exceed the cache size: 256 Kbyte = 0x40000 bytes
  • Do not explicitly clean a dirty cache containing data bound for external flash, or the system will hang.
  • Extra care should be taken to prevent any eviction of a line that should be written to external flash, or everything will hang. (The cache should be invalidated before doing a new refill, or fully reset).

Final remarks

The source code for both the embedded project and the Python tool are provided as-is.
They are provided, for example, purpose and it is not needed to understand their function to use the tool.

If ever needed, the communication protocol is written using protobuf and nanopb libraries to easily generate bindings for embedded C and Python.

For generating/modifying python/c bindings for the protocol defined, extra modules are needed:

The tool uses the ".elf" file generated by the embedded project, erasing it causes issues; modifying it may cause malfunctions.