Published on: April 6, 2025 / update from: April 6, 2025 - Author: Konrad Wolfenstein
Llama 4: The new generation of open AI systems from Meta
Llama 4 revealed: Metas key to the next AI age
Meta presented the latest generation of his AI models, Llama 4, on April 5, 2025. These new models represent significant progress in the development of open AI systems and have a number of groundbreaking functions that significantly increase their performance and efficiency. The Llama 4 series consists of different models, with two of them already publicly available, while the most powerful model is still in the training phase.
Suitable for:
The Llama 4 model family
Meta has developed three different models in the Llama 4 series, each of which is optimized for different applications:
Llama 4 Scout
Llama 4 Scout is a compact model with impressive technical specifications:
- 17 billion active parameters with 16 experts (a total of 109 billion parameters)
- Can be operated on a single NVIDIA H100 GPU with INT4 quantization
- Has a remarkably large context window of 10 million tokens, which makes it one of the first open models with this capacity
According to Meta, Scout exceeds other models in its class like Gemini 3, Gemini 2.0 Flash-Lite and Mistral 3.1. It is particularly suitable for tasks such as the summary of long documents, personalization based on user data and complex conclusions about large amounts of knowledge.
Llama 4 Maverick
Llama 4 Maverick is the more powerful of the two available models:
- 17 billion active parameters with 128 experts (a total of 400 billion parameters)
- The experimental chat version reached Elo 1417 on Lmarena
- Exceeds models such as GPT-4O and Gemini 2.0 flash in numerous benchmarks
This model is particularly suitable for general assistance and chat applications such as creative writing and shows results that are comparable to Deepseek V3 in Reasoning and coding tasks, but with half of the parameters.
Llama 4 Behemoth
Llama 4 Behemoth is METAS's most powerful model, which is not yet publicly available:
- 288 billion active parameters with 16 experts (a total of almost 2 trillion parameters)
- According to Meta, it exceeds GPT-4.5, Claude Sonnet 3.7 and Gemini 2.0 Pro for several stem benchmarks
- Serves as a “teacher model” for the smaller Llama 4 models
Behemoth is currently still in the training phase and will be published at a later date.
Technical innovations
The Llama 4 model series introduces several important technical innovations that improve their performance and efficiency:
Mixture of Experts (Moe) Architecture
One of the most important innovations at Llama 4 is the Mixture of Experts (MOE) architecture, in which only a part of the model parameters are activated for every token:
- This significantly reduces the calculation effort and latency, while the high performance is preserved
- At Llama 4 Maverick, each token is processed by a common expert and one of 128 gerized experts
- This architecture makes it possible to increase the total parameter of the model without increasing the inference costs
Native multimodality with early fusion
Llama 4 is the first open model with native multimodality by Early Fusion:
- Text and image tokens are integrated in a uniform model architecture
- This enables joint preliminary training with large quantities of text, image and video data
- In contrast to Llama 3.2, which used separate parameters for text and images, Llama 4 understands both modalities native with the same parameters
Extremely long context window
The extremely long context window of Llama 4 Scout is particularly impressive:
- With 10 million tokens, it clearly exceeds most of the available models
- This enables the processing of very long documents, entire code bases or extensive conversations
- The IROPE architecture (interleaved Attention Layers) makes this possible
New training methods
Meta has used several innovative methods for the training of Llama 4:
- METAP: A technique for the robust coordination of critical model hyperameter
- FP8 precision: Use of 8-bit slide-made numbers for efficient training
- Co-distillation: Use of Llama 4 Behemoth as a teacher model for smaller models
- Fully asynchronous online learning with reinforcements: a new infrastructure for large-scale learning
Availability and integration
The Llama 4 models are available via various platforms and services:
Download and cloud provider
- The models Scout and Maverick can be downloaded directly from Meta or via Hugging Face
- They are also available via various cloud platforms:
- Cloudflare Workers Ai
- Azure Ai Foundry and Azure Databricks
- Google Cloud's Vertex Ai
- Other partners will follow in the coming days
Integration into meta products
Meta has already updated its AI assistants in various platforms on Llama 4:
- Whatsapp, Messenger and Instagram Direct in 40 different countries
- The Meta.Ai website
- However, the multimodal functions are currently only available for English -language users in the USA
Suitable for:
- Meta Ai in Germany is here! WhatsApp, Instagram & Facebook get AI-with important differences in the US version
License and controversy
Although Meta Llama 4 refers to the “open source”, there are some restrictions in the license that triggered controversy:
License restrictions
The Llama 4 Community License contains several restrictions:
- Companies with more than 700 million monthly active users need a special license from META
- The models are apparently not allowed to use or distribute users and companies from the EU, presumably due to regulatory requirements
- There are requirements regarding naming and attribution for derived models
Debate about “Open Source”
There is a debate about whether Llama 4 should actually be called “Open Source”:
- The Open Source Initiative found in 2023 that the restrictions in the Llama license take it out “from the 'Open Source'” category
- Critics argue that it is more of a “source-open” or “with open weights” model than real open source software
- The license restrictions could be problematic for small companies without their own legal departments
Future plans
Meta has already given some insights into his future plans for Llama 4 and beyond:
Llamacon and other announcements
- Meta will organize the first Llamacon conference on April 29, 2025, in which further details on its AI models and product plans are to be announced
- The company also plans to publish a dedicated application for its meta-chat bot in the second quarter
Expansion of language skills
- Meta is working on improving Llama 4's language skills to enable more natural conversations
- The aim is to enable more fluid, two-sided dialogues in which users can interrupt the AI model
- Chris Cox, Chief Product Officer of Meta, described the upcoming Llama 4 as a “Omni model” that enables native language instead of translating language into text
Agentic Ai and extended skills
- Mark Zuckerberg has announced that Llama 4 will have “agent skills” that should enable new applications
- Meta aims to develop AI models that “carry out generalized actions, of course communicate with people and solve challenging problems”
- The company is considering offering premium subscriptions for its AI assistant for agent purposes such as reservations or video production
Why Llama 4 is a turning point in the AI landscape
The publication of Llama 4 represents a significant step in Metas strategy to become the leading area of the generative AI in the highly competitive area. With the introduction of the Mixture of Experts architecture, native multimodality and an impressively long context window, Meta shows that open models can keep up with the proprietary models of the large technology companies.
Despite the controversy about the licensing and the question of whether Llama 4 should really be referred to as an “open source”, technical progress is an important milestone. The ability of the models to process both text and images opens up new opportunities for developers and companies.
With the outstanding Llama 4 Behemoth and the announced plans for expanded language and agent skills, it is clear that Meta will further intensify its investments in AI. The coming months will show how these new models change the AI landscape and whether, as predicted by Mark Zuckerberg, they will actually help to make open AI models in the field of artificial intelligence.
Suitable for:
Your global marketing and business development partner
☑️ Our business language is English or German
☑️ NEW: Correspondence in your national language!
I would be happy to serve you and my team as a personal advisor.
You can contact me by filling out the contact form or simply call me on +49 89 89 674 804 (Munich) . My email address is: wolfenstein ∂ xpert.digital
I'm looking forward to our joint project.