Edge SoCs are becoming increasingly complex and heterogeneous. Unlocking the entire chip for AI workloads (CPU, GPU, NPU) is key to building performant edge AI products. For example, this paves the way for on-device agentic AI, where multiple models can run in parallel across different compute cores.
roofline's SDK provides one integrated toolchain to easily switch execution targets. Enabled by our MLIR-based compiler, retargeting takes just one line of code.
In this demo, we showcase migration of Qwen3 from Qualcomm's Oryon™ CPU to Adreno™ GPU. Realizing a speed-up of ~2x. Without any switching of tools or manual rewrites.
We are honored to announce that roofline has been selected by the European Innovation Council (EIC) Accelerator, receiving €2.5M in grant funding and a pre-committed equity investment for our next funding round.
The EIC Accelerator is Europe’s most competitive deep tech funding program. In this round, only 40 startups were selected from nearly 1,000 applicants: https://lnkd.in/e4aeD_sc
The recognition of our project "Retargetable AI Compiler Technology for Scalable Edge Deployment of Next-Generation AI Models" not only validates the potential of our technology, it fuels our mission to enable the edge AI products you dream of. Thank you to European Innovation Council and SMEs Executive Agency (EISMEA) and everyone who supported us on this journey.
On-device image-to-text with multimodal LLMsLLMs are getting smaller and now fit on edge systems. But bringing them into products and unlocking disruptive use cases remains a challenge. Common edge AI deployment tools cannot keep up with the pace of AI innovation, especially with cutting-edge models like multimodal LLMs.
Here is a look at what roofline's MLIR-based compiler can do. We run an image-to-text task using Google DeepMind's Gemma-3-4B, fully compiled, on real Qualcomm edge hardware:
🖼️ Input: Camera view of a mobile robot in an aisle.
💬 Output: Natural language reasoning. The mobile robot decides to slow down and adjust its path.
⚡ Performance: ~9x faster than TorchInductor.
Curious? Let's talk
We’re proud to be part of the Edge AI Foundation as a new Strategic Partner. At Roofline, we help developers go from idea to deployment—without the usual months of onboarding or complexity. Our AI compiler technology enables faster, more flexible, and efficient deployment of GenAI models on edge SoCs. Through close collaboration with leading chip vendors, we’re unlocking the full potential of edge AI hardware and supporting innovation across the ecosystem.We’re excited to contribute to the shared mission of advancing the future of edge AI—together with the Edge AI Foundation and its community.
We are excited to be on the ground and hold two talks at this year's EuroLLVM Developer meeting by the LLVM Foundation.
In their talks, our CTO Maximilian Bartel and Lead Engineer Christopher McGirr will be sharing key insights from our compiler work — focusing on MLIR and how to effectively engage with modern compiler infrastructure and the upstream community.
If you’re working on compilers, AI workloads, or MLIR — let’s meet up! We love to exchange ideas and discuss how we are building our AI deployment stack for the next generation of edge AI.
llama.cpp is often the go-to solution for running LLMs on edge devices, leveraging handwritten kernels for optimized execution. But while this approach delivers speed, it lacks flexibility when supporting new models and quantization techniques.
For instance, highly relevant models like Apple's OpenELM took months to be integrated into llama.cpp. Similarly, the GitHub issue to fix DeepSeek AI's multimodal janus-pro-1b has been open for nearly two months — highlighting the challenges of adapting to emerging architectures.
That’s why roofline is building an MLIR-based edge AI compiler with a model-agnostic LLM pipeline. Our approach enables day-0 deployment for DeepSeek’s janus-pro-1b and ensures rapid support for the latest quantization methods. Below, we showcase how our solution delivers ~8× performance gains over the native PyTorch compiler (TorchInductor) — and also why orange might just be the perfect company color!
Curious about how generic and scalable our approach is? Let’s chat!And stay tuned — janus-pro-1b also supports image-to-text and text-to-image capabilities.
Excited to be attending HiPEAC and the CODAI Workshop 2025 in Barcelona this week!
Today, I will present roofline's AI compiler technology, showcasing how it makes edge AI deployment more flexible and easy. If you want to learn more or get a demo, let me know.
Roofline is also hiring! If you have a background in compilers, computer architecture, and C++ and are looking for opportunities in Germany, let’s connect during the event.