Llama 4: The new generation of open AI systems from Meta
Llama 4 Revealed: Meta's Key to the Next Age of AI
On April 5, 2025, Meta unveiled the latest generation of its AI models, Llama 4. These new models represent a significant advancement in the development of open AI systems and feature a number of groundbreaking capabilities that substantially enhance their performance and efficiency. The Llama 4 series comprises several models, two of which are already publicly available, while the most powerful model is still in the training phase.
Related to this:
The Llama 4 model family
Meta has developed three different models in the Llama 4 series, each optimized for different use cases:
Llama 4 Scout
Llama 4 Scout is a compact model with impressive technical specifications:
- 17 billion active parameters with 16 experts (a total of 109 billion parameters)
- Can be operated on a single NVIDIA H100 GPU with Int4 quantization
- It features a remarkably large context window of 10 million tokens, making it one of the first open models with this capacity
According to Meta, Scout outperforms other models in its class, such as Gemini 3, Gemini 2.0 Flash-Lite, and Mistral 3.1. It is particularly well-suited for tasks such as summarizing long documents, personalizing content based on user data, and drawing complex conclusions from large amounts of knowledge.
Llama 4 Maverick
The Llama 4 Maverick is the more powerful of the two available models:
- 17 billion active parameters with 128 experts (400 billion parameters in total)
- The experimental chat version reached ELO 1417 on LMArena
- According to Meta, it outperforms models like GPT-4o and Gemini 2.0 Flash in numerous benchmarks
This model is particularly suitable for general assistance and chat applications such as creative writing and shows results comparable to DeepSeek v3 in reasoning and coding tasks, but with half the parameters.
Llama 4 Behemoth
Llama 4 Behemoth is Meta's most powerful model, but it is not yet publicly available:
- 288 billion active parameters with 16 experts (almost 2 trillion parameters in total)
- According to Meta, it outperforms GPT-4.5, Claude Sonnet 3.7 and Gemini 2.0 Pro in several STEM benchmarks
- Serves as a “teacher model” for the smaller Llama 4 models
Behemoth is currently still in the training phase and will be released at a later date.
Technical innovations
The Llama 4 model range introduces several significant technical innovations that improve its performance and efficiency:
Mixture of Experts (MoE) Architecture
One of the most important innovations in Llama 4 is the Mixture of Experts (MoE) architecture, in which only a subset of the model parameters is activated for each token:
- This significantly reduces computational effort and latency, while maintaining high performance
- In Llama 4 Maverick, each token is processed by a shared expert and one of 128 routed experts
- This architecture makes it possible to increase the overall parameters of the model without increasing the inference costs
Native multimodality with early fusion
Llama 4 is the first open model with native multimodality through Early Fusion:
- Text and image tokens are integrated into a unified model architecture
- This enables joint pre-training with large amounts of text, image and video data
- Unlike Llama 3.2, which used separate parameters for text and images, Llama 4 understands both modalities natively with the same parameters
Extremely long context window
The extremely long context window of Llama 4 Scout is particularly impressive:
- With 10 million tokens, it significantly surpasses most available models
- This enables the processing of very long documents, entire codebases, or extensive conversations
- The iRoPE architecture (interleaved attention layers) makes this possible
New training methods
Meta has used several innovative methods for training Llama 4:
- MetaP: A technique for robustly tuning critical model hyperparameters
- FP8 precision: Using 8-bit floating-point numbers for efficient training
- Co-distillation: Using Llama 4 Behemoth as a teacher model for smaller models
- Fully asynchronous online learning with amplification: A new infrastructure for large-scale learning
Availability and integration
The Llama 4 models are available through various platforms and services:
Download and cloud providers
- The Scout and Maverick models can be downloaded directly from Meta or via Hugging Face
- They are also available via various cloud platforms:
- Cloudflare Workers AI
- Azure AI Foundry and Azure Databricks
- Google Cloud's Vertex AI
- More partners will follow in the coming days
Integration into meta-products
Meta has already updated its AI assistants to Llama 4 across various platforms:
- WhatsApp, Messenger and Instagram Direct in 40 different countries
- The Meta.AI website
- However, the multimodal features are currently only available to English-speaking users in the USA
Related to this:
- Meta AI is here in Germany! WhatsApp, Instagram & Facebook are getting AI – with important differences from the US version
Licensing and Controversies
Although Meta Llama 4 is described as “open source”, there are some restrictions in the license that have sparked controversy:
License restrictions
The Llama 4 Community License contains several restrictions:
- Companies with more than 700 million monthly active users require a special license from Meta
- Users and companies from the EU are apparently not allowed to use or distribute the models, presumably due to regulatory requirements
- There are requirements regarding the naming and attribution of derived models
Debate about “Open Source”
There is a debate about whether Llama 4 should actually be called “Open Source”:
- The Open Source Initiative determined in 2023 that the restrictions in the Llama license take it “out of the 'Open Source' category”
- Critics argue that it is more of a “source-open” or “open-weights” model than true open-source software
- The licensing restrictions could be problematic for small businesses without their own legal departments
Future plans
Meta has already given some insights into his future plans for Llama 4 and beyond:
LlamaCon and other announcements
- Meta will host its first LlamaCon conference on April 29, 2025, where further details about its AI models and product plans will be announced
- The company also plans to release a dedicated application for its meta chatbot in the second quarter
Expanding language skills
- Meta is working to improve Llama 4's language skills to enable more natural conversations
- The goal is to enable smoother, two-way dialogues where users can interrupt the AI model
- Chris Cox, Chief Product Officer of Meta, described the upcoming Llama 4 as an “omni-model” that enables native language instead of translating speech to text
Agentic AI and enhanced capabilities
- Mark Zuckerberg has announced that Llama 4 will have “agentic capabilities” that will enable new use cases
- Meta aims to develop AI models that can “perform generalized actions, communicate naturally with humans, and solve challenging problems.”
- The company is considering offering premium subscriptions for its AI assistant for agent-related purposes such as reservations or video production
Why Llama 4 is a turning point in the AI landscape
The release of Llama 4 represents a significant step in Meta's strategy to become a leader in the highly competitive field of generative AI. With the introduction of the Mixture of Experts architecture, native multimodality, and an impressively long context window, Meta demonstrates that open models can compete with the proprietary models of major technology companies.
Despite the controversies surrounding licensing and the question of whether Llama 4 should truly be called “open source,” the technical advancements represent a significant milestone. The models' ability to process both text and images opens up new possibilities for developers and businesses.
With the Llama 4 Behemoth still pending and the announced plans for enhanced language and agent capabilities, it's clear that Meta will further intensify its investments in AI. The coming months will show how these new models will transform the AI landscape and whether they will indeed, as Mark Zuckerberg predicted, help open AI models become the leading force in artificial intelligence.
Related to this:
Your global marketing and business development partner
☑️ Our business language is English or German
☑️ NEW: Correspondence in your native language!
I and my team are happy to be available to you as your personal advisor.
You can contact me by filling out the contact form here wolfenstein@xpert.digital:or simply call me at +49 7348 4088 965. My email address is
I'm looking forward to our joint project.


