Gemma 4 Arrives: Google Drops Restrictions, Embraces True Open Models
Google on Thursday unveiled Gemma 4, its most capable family of open-weight models to date. It is built on the same research foundation as its proprietary Gemini 3 system.
But beyond the performance numbers, the announcement carries a change that may prove just as consequential for the industry: Gemma 4 ships under a standard Apache 2.0 license, dropping the restrictive Google-specific terms that had long made enterprises hesitant to fully commit to the platform.
Gemma 4 arrives in four sizes designed to cover the full spectrum of hardware, from the phone in your pocket to a high-end developer workstation.
At the lower end sit the Effective 2B (E2B) and Effective 4B (E4B), compact models engineered specifically for smartphones, Raspberry Pi boards, and IoT devices. Despite their size, both pack a 128,000-token context window and can natively process images, video, and audio inputs, all while running fully offline.
Google developed these edge models in collaboration with its Pixel team, Qualcomm Technologies, and MediaTek.
Stepping up, the 26B Mixture of Experts (MoE) and 31B Dense models target personal computers, consumer GPUs, and professional workstations. The larger two support context windows of up to 256,000 tokens, long enough to feed an entire code repository or lengthy document into a single prompt.
The 31B Dense model currently sits at third place among all open models on the Arena AI text leaderboard, while the 26B MoE holds the sixth spot. Google says both outperform models up to 20x their size by parameter count, a striking claim that independent benchmarks from Artificial Analysis appear to largely support.
The license change that changes everything
For enterprise teams, the headline may not be benchmarks at all.
Previous versions of Gemma carried a proprietary Google license with usage restrictions, making many organizations choose Mistral or Alibaba’s Qwen instead, not because the models were better, but because the terms were cleaner.
Gemma 4 eliminates that friction. The Apache 2.0 license is the same permissive standard used by much of the broader open-weight ecosystem. There are no custom carve-outs, no restrictions on commercial redistribution, and no risk of Google changing the rules after deployment.
Hugging Face co-founder and CEO Clément Delangue called it a turning point. “The release of Gemma 4 under an Apache 2.0 license is a huge milestone,” he said in a statement provided by Google. “We are incredibly excited to support the Gemma 4 family on Hugging Face on day one.”
Intelligence per parameter and what that really means
Google has leaned heavily into the phrase “intelligence-per-parameter” to describe Gemma 4, and the benchmarks give that claim some teeth.
The 31B Dense model scores 89.2% on AIME 2026, a rigorous mathematical reasoning test, and 80.0% on LiveCodeBench v6 for competitive coding. On GPQA Diamond, a graduate-level scientific reasoning benchmark, it hits 84.3%. For comparison, last generation’s Gemma 3 27B managed 20.8% on AIME and 29.1% on LiveCodeBench without thinking mode enabled.
The MoE architecture in the 26B model is particularly interesting from an infrastructure standpoint. It activates only 3.8 billion of its total parameters during inference, delivering reasoning quality competitive with much larger dense models while generating tokens at speeds closer to a 4B model. For enterprises running coding assistants or document processing pipelines at scale, that translates directly into fewer GPUs and lower per-token costs.
The unquantized 31B model fits on a single 80GB NVIDIA H100 GPU. At 4-bit quantization, it can run on consumer cards like an NVIDIA RTX 4090 or AMD RX 7900 XTX.
Multimodality baked in, not bolted on
All four Gemma 4 models natively process images and video, with support for variable resolutions and tasks like optical character recognition and chart understanding. The two edge models add native audio processing for speech recognition, with Google noting the audio encoder was significantly compressed compared to the prior generation for more responsive on-device transcription.
Function calling, structured JSON output, and native system instructions are built into all four models from the ground up, rather than relying on prompt engineering workarounds. The models also support more than 140 languages, which Google says makes them viable for global application development.
Where to get it
Gemma 4 is available now on Hugging Face, Kaggle, and Ollama.
The 31B and 26B MoE models are accessible through Google AI Studio, while the edge models can be explored in Google AI Edge Gallery. Day-one framework support includes Hugging Face Transformers, vLLM, llama.cpp, MLX, Ollama, NVIDIA NIM, LM Studio, and others. Fine-tuning is supported via Google Colab, Vertex AI, and local consumer GPUs.
For cloud deployment, the models scale through Vertex AI, Cloud Run, and Google Kubernetes Engine.
Stay ahead of Google’s broader AI push with our full breakdown of its March 2026 updates, where Gemini evolves into a proactive, deeply personalized assistant across search, devices, and apps.
The post Gemma 4 Arrives: Google Drops Restrictions, Embraces True Open Models appeared first on eWEEK.