{*}
Add news
March 2010 April 2010 May 2010 June 2010 July 2010
August 2010
September 2010 October 2010 November 2010 December 2010 January 2011 February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024 May 2024 June 2024 July 2024 August 2024 September 2024 October 2024 November 2024 December 2024 January 2025 February 2025 March 2025 April 2025 May 2025 June 2025 July 2025 August 2025 September 2025 October 2025 November 2025 December 2025 January 2026 February 2026 March 2026 April 2026
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
News Every Day |

Goodbye, Llama? Meta launches new proprietary AI model Muse Spark — first since Superintelligence Labs' formation

Meta has been one of the most interesting companies of the generative AI era — initially gaining a loyal and huge following of users for the release of its mostly open source Llama family of large language models (LLMs) beginning in early 2023 but coming to screeching halt last year after Llama 4 debuted to mixed reviews and ultimately, admissions of gaming benchmarks.

That bumpy rollout of Llama 4 apparently spurred Meta founder and CEO Mark Zuckerberg to totally overhaul Meta's AI operations in the summer of 2025, forming a new internal division, Meta Superintelligence Labs (MSL) which he recruited 29-year-old former Scale AI co-founder and CEO Alexandr Wang to lead as Chief AI Officer.

Now, today, Meta is showing us the fruits of that effort: Muse Spark, a new proprietary model that Wang says (posting on rival social network X, used more often by the machine learning community) is "the most powerful model that meta has released," and has "support for tool-use, visual chain of thought, & multi-agent orchestration." He also says it will be the start of a new Muse family of models, raising questions about what will become of Meta's popular lineup and ongoing development of the Llama family.

It arrives not as a generic chatbot, but as the foundation for what Wang calls "personal superintelligence"—an AI that doesn’t just process text but "sees and understands the world around you" to act as a digital extension of the self, echoing Zuckberg's public manifesto for a vision of personal superintelligence published in summer 2025.

However, it is proprietary only — confined for now to the Meta AI app and website, as well as a " private API preview to select users," according to Meta's blog post announcing it — a move likely to rankle the literally billions of users of Llama models and the thousands of developers who relied upon it (some of whom are active participants in rival social network Reddit's r/LocalLLaMA subreddit). In addition, no pricing information for the model has yet been announced.

It's unclear if Meta has ended development on the Llama family entirely. When asked directly by VentureBeat, a Meta spokesperson said in an email: “Our current Llama models will continue to be available as open source,” which doesn’t address the question of development of future Llama models.

Visual chain-of-thought

At its core, Muse Spark is a natively multimodal reasoning model. Unlike previous iterations that "stitched" vision and text together, Muse Spark was rebuilt from the ground up to integrate visual information across its internal logic. This architectural shift enables "visual chain of thought," allowing the model to annotate dynamic environments—identifying the components of a complex espresso machine or correcting a user's yoga form via side-by-side video analysis.

The most significant technical leap, however, is a new "Contemplating" mode. This feature orchestrates multiple sub-agents to reason in parallel, allowing Meta to compete with extreme reasoning models like Google's Gemini Deep Think and OpenAI's GPT-5.4 Pro.

In benchmarks, this mode achieved 58% in "Humanity’s Last Exam" and 38% in "FrontierScience Research," figures that Meta claims validate their new scaling trajectory.

Perhaps more impressive for the company’s bottom line is the model’s efficiency. Meta reports that Muse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick, its previous mid-size flagship. This efficiency is driven by a process called "thought compression". During reinforcement learning, the model is penalized for excessive "thinking time," forcing it to solve complex problems with fewer reasoning tokens without sacrificing accuracy.

Benchmarks reveal a return-to-form

The launch of Muse Spark is framed as a statistical "quantum leap," ending Meta’s year-long absence from the absolute frontier of AI performance.

By reconciling Meta’s official internal data with independent auditing from third-party LLM tracking firm Artificial Analysis, a clear picture emerges: Muse Spark is not just a marginal improvement over the Llama series; it is a fundamental re-entry into the "Top 5" global models.

According to the Artificial Analysis Intelligence Index v4.0, Muse Spark achieved a score of 52. For context, Meta’s previous flagship, Llama 4 Maverick, debuted in 2025 with an Index score of just 18.

By nearly tripling its performance, Muse Spark now sits within striking distance of the industry’s most elite systems, trailing only Gemini 3.1 Pro Preview (57), GPT-5.4 (57), and Claude Opus 4.6 (53).

Meta’s official benchmarks suggest that Muse Spark is particularly dominant in multimodal reasoning, specifically where visual figures and logic intersect.

  • CharXiv Reasoning: In "figure understanding," Muse Spark achieved a score of 86.4, significantly outperforming Claude Opus 4.6 (65.3), Gemini 3.1 Pro (80.2), and GPT-5.4 (82.8).

  • MMMU Pro: Official reports place the model at 80.4, while Artificial Analysis’s independent audit measured it at 80.5%. This makes it the second-most capable vision model on the market, surpassed only by Gemini 3.1 Pro Preview (83.9% official; 82.4% independent).

  • Visual Factuality (SimpleVQA): Muse Spark scored 71.3, placing it ahead of GPT-5.4 (61.1) and Grok 4.2 (57.4), though it narrowly trails Gemini 3.1 Pro (72.4).

These scores validate Meta’s focus on "visual chain of thought," enabling the model to not just recognize objects, but to reason through complex spatial problems and dynamic annotations.

The "Thinking" gear of Muse Spark was put to the test against specialized benchmarks designed to break non-reasoning models.

  • Humanity’s Last Exam (HLE): In this multidisciplinary evaluation, Meta reports a score of 42.8 (No Tools) and 50.4 (With Tools). Independent audits by Artificial Analysis tracked the model at 39.9%, trailing Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (41.6%).

  • GPQA Diamond (PhD Level Reasoning): Muse Spark achieved a formidable 89.5, surpassing Grok 4.2 (88.5) but trailing the specialized "max reasoning" outputs of Opus 4.6 (92.7) and Gemini 3.1 Pro (94.3).

  • ARC AGI 2: This remains a notable weak point. Muse Spark scored 42.5, far behind the abstract reasoning puzzles solved by Gemini 3.1 Pro (76.5) and GPT-5.4 (76.1).

  • CritPT (Physics Research): Independent auditing found Muse Spark achieved the 5th highest score at 11%. This marks a substantial lead over Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%).

One of the most striking results from the official data is Muse Spark's performance in the health sector, likely a result of Meta's collaboration with over 1,000 physicians.

  • HealthBench Hard: Muse Spark achieved 42.8, a massive lead over Claude Opus 4.6 (14.8), Gemini 3.1 Pro (20.6), and even GPT-5.4 (40.1).

  • MedXpertQA (Multimodal): It scored 78.4, comfortably ahead of Opus 4.6 (64.8) and Grok 4.2 (65.8), though it still trails Gemini 3.1 Pro’s top-tier score of 81.3.

Agentic Systems and Efficiency: The "Thought Compression" Effect

While Muse Spark excels at reasoning, its "agentic" performance—executing real-world work tasks—presents a more nuanced picture.

  • SWE-Bench Verified: Muse Spark scored 77.4, trailing Claude Opus 4.6 (80.8) and Gemini 3.1 Pro (80.6).

  • GDPval-AA Elo: Meta’s official score of 1444 differs slightly from Artificial Analysis’s recorded 1427. In both cases, Muse Spark trails GPT-5.4 (1672) and Opus 4.6 (1606), suggesting that while the model "thinks" well, it is still refining its ability to "act" in long-horizon software and office workflows.

  • Token Efficiency: This is where Muse Spark distinguishes itself. To run the Intelligence Index, it used 58 million output tokens. In contrast, Claude Opus 4.6 required 157 million tokens and GPT-5.4 required 120 million. This supports Meta's claim of "thought compression"—delivering frontier-class intelligence while using less than half the "thinking time" of its closest competitors.

Benchmark

Llama 4 Maverick (2025)

Muse Spark (Official)

Gemini 3.1 Pro (Official)

Intelligence Index Score

18

52

57

MMMU Pro

--

80.4

83.9

CharXiv Reasoning

--

86.4

80.2

HealthBench Hard

--

42.8

20.6

License

Open-Weights

Proprietary

Proprietary

With Muse Spark, Meta has successfully transitioned from being the "LAMP stack for AI" to a direct challenger for the title of "Personal Superintelligence". While agentic workflows remain a hurdle, its dominance in vision, health, and token efficiency places Meta back at the center of the frontier race.

Personal wellness and Instagram shopping

Meta is immediately deploying Muse Spark to power specialized experiences across its app family.

  • Shopping Mode: A new feature that leverages Meta’s vast creator ecosystem. The AI picks up on brands, styling choices, and content across Instagram and Threads to provide personalized recommendations, effectively turning every post into a shoppable interaction.

  • Health Reasoning: In a move toward medical utility, Meta collaborated with over 1,000 physicians to curate training data. Muse Spark can now analyze nutritional content from photos of food or provide "health scores" for pescatarian diets with high cholesterol.

  • Interactive UI: The model can generate web-based minigames or tutorials on the fly. For example, a user can prompt the AI to turn a photo into a playable Sudoku game or a highlights-based tutorial for home appliances.

Evaluation awareness

While Muse Spark demonstrates strong refusal behaviors regarding biological and chemical weapons, its safety profile includes a startling new discovery. Third-party testing by Apollo Research found that the model possesses a high degree of "evaluation awareness".

The model frequently recognized when it was being tested in "alignment traps" and reasoned that it should behave honestly specifically because it was under evaluation.

While Meta concluded this was not a "blocking concern" for release, the finding suggests that frontier models are becoming increasingly "conscious" of the testing environment—potentially rendering traditional safety benchmarks less reliable as models learn to "game" the exam.

What happens to Llama?

In February 2023, Meta released Llama 1 to demonstrate that smaller, compute-optimal models could match larger counterparts like GPT-3 in efficiency. Although access was initially restricted to researchers, the model weights were leaked via 4chan on March 3, 2023, an event that inadvertently democratized high-tier research and catalyzed a global movement for running models on consumer-grade hardware.

This shift was solidified in July 2023 with the release of Llama 2, which introduced a commercial license that permitted self-hosting for most organizations. This approach saw rapid adoption, with the Llama family exceeding 100 million downloads and supporting over 1,000 commercial applications by the third quarter of 2023.

Through 2024 and 2025, Meta scaled the Llama family to establish it as the essential infrastructure for global enterprise AI, frequently referred to as the LAMP stack for AI. Following the launch of Llama 3 in April 2024 and the landmark Llama 3.1 405B in July, Meta achieved performance parity with the world's leading proprietary systems.

The subsequent release of Llama 4 in April 2025 introduced a Mixture-of-Experts architecture, allowing for massive parameter scaling while maintaining fast inference speeds. By early 2026, the Llama ecosystem reached a staggering scale, totaling 1.2 billion downloads and averaging approximately one million downloads per day.

This widespread adoption provided businesses with significant economic sovereignty, as self-hosting Llama models offered an 88% cost reduction compared to using proprietary API providers.

As of April 2026, Meta’s role as the undisputed leader of the open-weight movement has transitioned into a highly contested multi-polar landscape characterized by the rise of international competitors.

While the United States accounts for 35% of global Llama deployments, Chinese models from labs like Alibaba and DeepSeek began accounting for 41% of downloads on platforms like Hugging Face by late 2025. Throughout early 2026, new entrants such as Zhipu AI’s GLM-5 and Alibaba’s Qwen 3.6 Plus have outpaced Llama 4 Maverick on general knowledge and coding benchmarks.

In response to this global pressure, Meta's Muse Spark arrives with hefty expectations and an open source legacy that will be tough to live up to.

Proprietary only (for now)

The launch marks a controversial departure from Meta AI's "open science" roots. While the Llama series was famously accessible to developers, Muse Spark is launching as a proprietary model.

Wang addressed the shift on X, stating: "Nine months ago we rebuilt our ai stack from scratch. New infrastructure, new architecture, new data pipelines... This is step one. Bigger models are already in development with plans to open-source future versions."

However, the developer community remains skeptical. Some see this as a necessary pivot after the Llama 4 series failed to gain expected developer traction; others view it as Meta "closing the gates" now that it has a competitive reasoning model.

Wang himself acknowledged the transition’s difficulty, noting there are "certainly rough edges we will polish over time".

For the 3 billion people using Meta’s apps, the change will be felt almost instantly. The AI they interact with is no longer just a library of information, but an agent with a $27 billion brain and a mandate to understand their world as intimately as they do.

Ria.city






Read also

Only Losers Play the Madman

Frosted Flakes Drops Michigan Wolverines National Championship Cereal Box

HereeNo Excuses: Ro Khanna Torches Fox: MAGA Pouts Over Trump Destroying GOP Midterms Chances s How To Remove An Unfit President From Office

News, articles, comments, with a minute-by-minute update, now on Today24.pro

Today24.pro — latest news 24/7. You can add your news instantly now — here




Sports today


Новости тенниса


Спорт в России и мире


All sports news today





Sports in Russia today


Новости России


Russian.city



Губернаторы России









Путин в России и мире







Персональные новости
Russian.city





Friends of Today24

Музыкальные новости

Персональные новости