Llama 4, Meta’s answer DeepSeek, is here! Long context Scout and Maverick models are now available with 2T parameter Behemoth coming soon!

VentureBeat created with Midjourney version 7

Subscribe to our daily and weekly emails for the latest updates on AI industry coverage. Learn More


In January 2025, the AI landscape changed dramatically after a Chinese AI startup DeepSeek launched its powerful open-source language reasoning model DeepSeekR1 publicly around the world. This model outperformed U.S. tech titans like Meta.

DeepSeek usage spread quickly among researchers and businesses. Meta was reportedly in panic mode when it learned that this new R1 had been trained at a fraction of the price of many other leading AI models, as low as several million dollars – what it pays to some of its own AI leaders – yet still achieved the top performance in open source category.

Meta’s generative AI strategy had up to that point been based on releasing top-class open-source models under its brand “Llama”for researchers and companies alike to build upon (at least if they have fewer than 700,000,000 monthly users, in which case they are supposed contact Meta for paid licensing terms).

But DeepSeek R1’s astonishingly high performance on a much smaller budget had allegedly shaken Meta’s leadership and forced a reckoning with the last version Llama. 3.3was released a month earlier in December 2024, yet it already looks outdated.

We now know the results of that reckoning. Today, Mark Zuckerberg, Meta’s founder and CEO, announced a new initiative on his Instagram account . Developers can now download the new Llama 4 modelsand start using or fine-tuning them immediately. llama.com (19459081) and AI code-sharing community Hugging Face .

Today, a massive 2-trillion-parameter Llama 4 Behemoth will also be previewed. Meta’s blog poston the releases said that it was still in training, but gave no indication as to when it would be released. (Remember parameters are the settings that determine the model’s behavior. More means a more complex and powerful model.)

The models are multimodal – they can receive and generate text, video and imagery. Audio was not mentioned.

They also have extremely long context windows – 1 million tokens for Llama 4 Scout and 10 millions for Llama 4 Scout – which is equivalent to 1,500 and 15,000 text pages, respectively. All of this can be handled by the model in a single input/output interface. This means that a user can theoretically upload or copy up to 7,500 pages of text and receive the same amount in return from Llama 4 Scout. This would be useful for information-dense areas such as medicine, science engineering, mathematics, literary etc.

Here is what we know about this release:

All-in on the mixture-of experts

The three models all use the “mixture of experts (MoE),” architecture approach. This is a technique that was popularized by earlier model releases of OpenAI or Mistral. It combines smaller models, specialized (“experts”), in different tasks, media formats, and subjects into a larger, unified model. Each Llama 4 version is therefore said to be a mixture 128 different experts. It is more efficient to run, because only the expert required for a specific task, plus a “shared” expert, handles each token instead of having to run the entire model for each one.

The Llama blog post notes that:

Consequently, while all parameters in memory are active, only a small subset are activated when serving these models. This reduces model serving costs and latency, improving inference efficiency. Llama 4 Maverick runs on a single H100 DGX for easy deployment or with distributed inference to maximize efficiency.

Both Scout (for Maverick) and Maverick (for Scout) are available for public self-hosting. No hosted APIs or pricing tiers for official Meta infrastructure have been announced. Meta instead focuses on distribution via open download and integration of Meta AI with WhatsApp, Messenger and Instagram.

Meta estimates that the inference costs for Llama Maverick are $0.19 to $0.49 each per 1,000,000 tokens (using input and output blended at a ratio of 3:1). This is significantly cheaper than proprietary models such as GPT-4o which, according to community benchmarks, costs $4.38 for 1 million tokens.

In fact, shortly after I published this post, I was informed that cloud computing had been adopted. G is an AI inference provider that has enabled Llama 4 Scout at the following prices.

  • Llama 4 Scout : $0.11/M input tokens & $0.34/M output tokens at a blended price of $0.13.
  • Llama 4 Maverick : $0.50/M input tokens & $0.77/M output tokens at a blend rate of $0.53.

The three Llama4 Maverick]

        • [196590][196590][196590][196590][196590][196590][1965][196590][196590][196590][196590]All 3 Llama4]——– They are designed to compete directly with “classical” non-reasoning LLMs, and multimodal models like OpenAI’s GPT-4o, and DeepSeek V3 (more below!). The exception is Llama 4 Behemoth which appears to threaten DeepSeek R1.

          Meta also built custom post-training pipelines for Llama 4 that focused on improving reasoning. For example:

          • Remove over 50% of easy prompts during supervised tuning.
          • Adopting an ongoing reinforcement loop with increasingly difficult prompts.
          • Using curriculum sampling and pass@k evaluation to improve performance in math, logic and coding.
          • Implementing MetaP is a new technique which allows engineers to tune hyperparameters on models (like per-layer rates of learning) and apply them to different model sizes and token types while preserving intended model behavior. MetaP is a technique that has a lot of potential. It can be used to create many different types of models by setting hyperparameters in one model. This will increase training efficiency. Ben Dickson, a VentureBeat colleague who is an LLM expert, said of the new MetaP method: “This can save you a lot in terms of time and money.” It means they run experiments on smaller models rather than doing them on large-scale models.

            Especially when training large models like Behemoth which uses 32K graphics cards and FP8 precision to achieve 390 TFLOPs/GPU on more than 30 trillion tokens – more than twice the Llama 3 data.

            This means that researchers can tell a model how to behave in general, and then apply it to different versions of the model and media.

            A powerful model family

            but not the most powerful yet. Mark Zuckerberg, Meta CEO, said in an announcement video on Instagram (19459081) (a Meta subsidiary), that the company “goal is to create the world’s best AI, open-source it, and make universally accessible, so that everyone around the world can benefit.”

            This is a carefully worded claim, just as Meta’s blog post calls Llama 4 Scout “the world’s best multimodal model for its class

            These are powerful models that are near the top compared to other models in their parameter-size classes, but they do not necessarily set new performance records. Meta was eager to tout the models that its new Llama family beats.

            Llama4 Behemoth

            • outperforms GPT 4.5, Gemini 2.0 pro, and Claude Sonnet 3.7 in:
              • GPQA Diamond (73.7)
              • Mmlu for (82.2)

                Call Maverick

                • Beats GPT-4o, Gemini 2.0 Flash, and MathVista on most multimodal reasoning Benchmarks:
                  • Chartqa (Vs. GPT-4O, 85.7)
                • DOCVQA (VS. GPT-4O, 92.8)
                • Scores for benchmarks:
                  • Chartqa (Vs. GPT-4O, 85.7)[1965904]

                    Call scout

                    • Outperforms Mistral 3.1 and Gemini 2.0 Flash Lite on:
                      • MMLU: 74.3
                      • Mathvista: 70.7
                    • Unmatched token context length of 10M–ideal in long documents, codebases or multi-turn analyses
                    • Designed to be deployed efficiently on a single GPU H100

                    How does Llama compare to DeepSeek after all this?

                    There are also other reasoning-heavy models, such as DeepSeek R1, OpenAI “o” series (like GPT-4o), Gemini2.0, and Claude Sonnet.

                    Comparing the highest-parameter benchmarked model–Llama4 Behemoth –to the initial DeepSeek R1 chart for R1-32B, and OpenAI O1 models shows how Llama4 Behemoth compares:

                    Benchmark

                    Llama4 Behemoth (OpenAI o1-1217) DeepSeek OpenAI o1-1217
                    Math-500 GPQA Diamonds GPQA Diamonds GPQA Diamonds GPQA Diamonds
                    MMLUs
                    • The MATH-500 shows that Llama4 Behemoth is slightly behind DeepSeek R1
                    OpenAI o1-1217
                    MATH 500 95.0 97.3 96.4
                    GPQA Diamond 73.7 71.5[1965[1965

                  • GPQA Diamond : Behemoth is slightly ahead of DeepSeek1 but behind OpenAI o1.
                  • MMLU : Behemoth is behind both but still outperforms Gemini 2. Pro and GPT 4.5.
                  • Takeaway – While DeepSeek o1 and OpenAI R1 edge out Behemoth in a few metrics, Llama 4 Behemoth is still highly competitive and performs near or at the top of its class’ reasoning leaderboard.

                    Safety and less political ’bias’

                    Meta emphasized model alignment and security by introducing tools such as Llama Guard and Prompt Guard to help developers detect unsafe or adversarial input/output, and implementing Generative Offensive Agent Testing for automated red-teaming.

                    According to the company, Llama 4 is a significant improvement in terms of “political bias”. It says that “specifically [leading LLMs] have historically leaned left when discussing political and social issues,” and that Llama 4 is better at courting right-wingers…in keeping. Zuckerberg’s embrace with Republican U.S. President Donald J. Trump (19459081) and his party after the 2024 elections.

                    What Llama 4 looks like so far

                    Meta’s Llama 4 model brings together efficiency, openness and high-end performances across multimodal and reasoning task.

                    With Scout, Maverick, and Behemoth now publicly available, and Behemoth being previewed as an advanced teacher model, the Llama Ecosystem is positioned to provide a competitive alternative to top-tier, proprietary models from OpenAI. Anthropic. DeepSeek. and Google.

                    Llama 4 provides high-performance, flexible options for building enterprise-scale assistants or AI research pipelines. It also offers a clear focus on reasoning-first design.

                    VB Daily provides daily insights on business usecases

                    Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to maximize ROI, from regulatory changes to practical deployments.

                    Read our privacy policy

                    Thank you for subscribing. Click here to view more VB Newsletters.

                    An error occured.


www.aiobserver.co

More from this stream

Recomended