Case studies

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

runtime

sdk

Roofline and NXP enable next-generation software support for LLMs on eIQ Neutron NPU

In collaboration with NXP® Semiconductors, we built on Roofline’s scalable MLIR and IREE compiler infrastructure to extend our heterogeneous execution stack to NPUs. Starting with LLM enablement for NXP’s eIQ® Neutron NPU on the i.MX 95 applications processor, this case study showcases three key advantages of the software enablement: 1) Unlocking broad model coverage, 2) overcoming accelerator memory limitations for models above 2GB, and 3) delivering clear performance gains of up to 3.2x in LLM prefill performance over CPU-only execution.

Case study

sdk

runtime

Dynamic shape support: A key enabler for on-device LLM inference

This case study shows how Roofline achieves up to 23× higher throughput by solving one of the most fundamental bottlenecks in efficient on-device language model inference: dynamic shapes in the prefill stage.

Case study

runtime

sdk

Asynchronous Heterogeneous Execution for Edge SoCs

This case study shows how Roofline enabled asynchronous heterogeneous execution on modern edge System-on-Chips (SoCs). Our technology coordinates the SoCs’ CPU–GPU–NPU hardware for running full AI models efficiently. We unlock a long-missing piece in AI deployment software on the edge to run larger models more efficiently on the best suited device available.

Case study

sdk

runtime

Roofline x ARM: Enhancing software support for ARM SVE in MLIR and IREE

This case study showcases how Roofline and ARM enabled scalable, vector-length-agnostic ML execution on Arm CPUs by implementing data-tiled Scalable Vector Extension (SVE) support end-to-end in IREE, unlocking up to 100× speedups on real models and hardware.

Case study

Sorry, we couldn’t find anything matching that. How about browsing our latest cases?

Back to Main Go to cases