The runtime is a small piece of software that is ported to the device to run the compiled model. It is a lightweight, fast inference engine built in C to execute the optimized kernels generated in the SDK. Our runtime supports heterogeneous execution across CPUs, GPUs, and NPUs, handles dynamic shapes for models like LLMs, and can be extended to custom NPUs through hardware abstraction layers. The runtime runs on Linux, macOS, Windows, and even bare-metal systems.
Runtime
A lightweight runtime that executes the compiled model on the device. It brings all features for efficient SoC orchestration.
