
Cohere launches Tiny Aya — open, offline-first multilingual LLMs
Cohere announced a new family of open multilingual models engineered for offline use and regional language fluency. The core model is a 3.35 billion-parameter LLM trained on a single cluster of 64 Nvidia H100 GPUs, and the release prioritizes local deployment on laptop-class hardware.
The Tiny Aya line targets native-language applications across many regions, with explicit support for more than 70 languages and focused coverage for South Asian tongues such as Bengali, Hindi, Punjabi, Urdu, Gujarati, Tamil, Telugu and Marathi. Cohere organized the family into distinct variants to accelerate specialist use: a globally aligned instruction-tuned build and regional forks tuned for African, South Asian, and Asia-Pacific/West Asia/European languages.
Engineering choices emphasize efficiency. Training used a single 64-H100 cluster rather than multi-thousand-GPU fleets, and runtime software is optimized so instances run without persistent cloud access. That enables on-device translation, privacy-preserving inference, and lower latency for disconnected environments common in emerging markets.
Cohere is releasing models, training corpora, and evaluation sets on community platforms to enable replication and downstream adaptation. The artifact distribution includes downloads via Hugging Face, local runtime support through Ollama, and deployment examples for Kaggle and the Cohere Platform. A technical report describing training methodology and evaluation protocols will follow.
This launch was announced alongside the India AI Summit, signaling an explicit go-to-market focus on linguistically diverse countries. For developers and researchers, the open-weight license lowers licensing friction for customization, fine-tuning, and offline distribution in constrained networks.
From a product lens, the family balances model scale and footprint. The base configuration (3.35B parameters) is positioned for single-device inference, while instruction-tuned and regional variants provide higher instruction-following and cultural nuance. That design reduces barrier-to-entry for startups and research teams without large compute budgets.
Cohere’s move also nudges the larger model ecosystem. By publishing compact, open models with strong regional coverage, Cohere increases competition against both closed large-scale LLM providers and other open-weight efforts focused on multilingual or on-device use.
Adoption vectors include offline translation, regional conversational agents, and privacy-sensitive tools for journalism, education, and local-government services. The availability of datasets and evaluation artifacts should accelerate benchmarking and third-party audits in low-resource languages.
Commercial context: Cohere has signaled a near-term IPO path and finished 2025 with robust recurring revenue growth, which gives the company runway to support open research efforts while pursuing enterprise customers. Expect the company to position Tiny Aya as both a research contribution and a funnel for localized enterprise deployments.
Developers should evaluate Tiny Aya on three vectors: latency and memory footprint on target devices, instruction-following fidelity for local languages, and dataset provenance for safety and bias assessment. The model family is well suited for experiments that require offline inference and rapid regional adaptation.
- Model parameters: 3.35B
- Languages supported: 70+
- Training compute: single cluster — 64 × Nvidia H100 GPUs
- On-device capability: laptop-class, offline-capable
Read Our Expert Analysis
Create an account or login for free to unlock our expert analysis and key takeaways for this development.
By continuing, you agree to receive marketing communications and our weekly newsletter. You can opt-out at any time.
Recommended for you

Mistral unveils lightweight Voxtral models for near‑real‑time multilingual transcription
French AI startup Mistral has released two compact speech-to-text models — one for batch transcription and an open-source variant for near‑real‑time conversion — designed to run on phones and laptops and support translation across 13 languages. The move prioritizes low-latency, local execution and regulatory alignment with European sovereignty trends, positioning Mistral as a cost‑efficient alternative to larger U.S. incumbents.

Arcee AI unveils Trinity — a 400B-parameter Apache-licensed LLM aiming to reshape open-source AI
A small U.S. startup, Arcee AI, has released Trinity, a 400-billion-parameter foundation model under an Apache license and claims benchmark parity with leading open models. Trained in six months for $20M using 2,048 Nvidia Blackwell B300 GPUs, Trinity is text-only today with vision and speech plans and will be available in base, instruct, and unmodified ‘TrueBase’ flavors plus a hosted API coming soon.



