Edge SoCs are becoming increasingly complex and heterogeneous. Unlocking the entire chip for AI workloads (CPU, GPU, NPU) is key to building performant edge AI products. For example, this paves the way for on-device agentic AI, where multiple models can run in parallel across different compute cores.
roofline's SDK provides one integrated toolchain to easily switch execution targets. Enabled by our MLIR-based compiler, retargeting takes just one line of code.
In this demo, we showcase migration of Qwen3 from Qualcomm's Oryon™ CPU to Adreno™ GPU. Realizing a speed-up of ~2x. Without any switching of tools or manual rewrites.