Mirai builds a Rust inference engine to accelerate on-device AI
Mirai: compact runtime for on-device model inference
A small London team launched a runtime focused on boosting ML performance inside phones and laptops, backed by $10M in seed funding. The initial implementation targets Apple Silicon, using a Rust codebase that the founders say can lift generation throughput by roughly 37%.
Integration is designed to be lightweight: Mirai plans an SDK that lets developers embed the runtime with only a handful of code lines, turning device-resident models into usable product features quickly. The team emphasizes tuning the runtime and execution path while preserving model weights and the original output quality.
Product scope is staged. Today the stack focuses on text and voice modalities; vision support is on the roadmap. To help ecosystem validation, Mirai intends to publish on-device benchmarks so model creators can compare edge performance against cloud baselines.
- Planned features: runtime SDK, benchmark suite, and hybrid orchestration layer to fall back to cloud when needed.
- Platform expansion: talks with chipmakers and a future Android port are in progress.
- Targeted apps: low-latency assistants, transcribers, translators, and local chat agents.
Investors framed this as a response to rising cloud inference spend: running more compute at the edge lowers per-request economics for high-volume consumer services. Backing came from an investor syndicate led by Uncork Capital alongside several individual technical investors.
The company also plans an orchestration layer that can route requests which exceed device capability up to remote servers, acknowledging that some ML tasks will remain cloud-native for the foreseeable future. That mixed-mode approach is central to Mirai’s go-to-market story: speed and cost reduction where feasible, cloud fallback where necessary.
On the developer side, the promise is straightforward: fewer integration steps, lower inference latency, and reduced dependence on ongoing cloud calls. For model vendors, Mirai is opening a path to validate edge suitability through its benchmarks and tuned runtimes.
Risks remain. Hardware limits constrain large multimodal models, and success depends on partnerships with model makers and silicon vendors to tune workloads sensibly. Still, this stack could materially change cost and latency dynamics for common consumer AI features if adoption grows.
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you

Mistral AI acquires Koyeb to accelerate AI cloud, on‑prem inference and GPU optimization
Mistral AI has bought Paris-based Koyeb to fold serverless deployment and isolated runtime tech into its cloud stack, enabling model inference on customer hardware and tighter GPU management. The deal complements Mistral’s broader infrastructure push — including a €1.2 billion Sweden data‑center program with EcoDataCenter and new compact speech‑to‑text models optimized for local hardware — reinforcing a hybrid, Europe‑anchored AI strategy.
Positron secures $230M to accelerate AI inference memory chips and challenge Nvidia
Positron raised $230 million in a Series B led in part by Qatar’s sovereign wealth fund to scale production of memory-focused chips optimized for AI inference. The funding gives the startup strategic runway amid wider industry investment in memory and packaging innovations, but it must prove efficiency claims, ramp manufacturing, and integrate with software stacks to displace entrenched GPU suppliers.




