In collaboration with NXP® Semiconductors, we built on Roofline’s scalable MLIR and IREE compiler infrastructure to extend our heterogeneous execution stack to NPUs. Starting with LLM enablement for NXP’s eIQ® Neutron NPU on the i.MX 95 applications processor, this case study showcases three key advantages of the software enablement: 1) Unlocking broad model coverage, 2) overcoming accelerator memory limitations for models above 2GB, and 3) delivering clear performance gains of up to 3.2x in LLM prefill performance over CPU-only execution.
This case study shows how Roofline achieves up to 23× higher throughput by solving one of the most fundamental bottlenecks in efficient on-device language model inference: dynamic shapes in the prefill stage.
This case study shows how Roofline enabled asynchronous heterogeneous execution on modern edge System-on-Chips (SoCs). Our technology coordinates the SoCs’ CPU–GPU–NPU hardware for running full AI models efficiently. We unlock a long-missing piece in AI deployment software on the edge to run larger models more efficiently on the best suited device available.
This case study showcases how Roofline and ARM enabled scalable, vector-length-agnostic ML execution on Arm CPUs by implementing data-tiled Scalable Vector Extension (SVE) support end-to-end in IREE, unlocking up to 100× speedups on real models and hardware.