{*}
Add news
March 2010 April 2010 May 2010 June 2010 July 2010
August 2010
September 2010 October 2010 November 2010 December 2010 January 2011 February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024 May 2024 June 2024 July 2024 August 2024 September 2024 October 2024 November 2024 December 2024 January 2025 February 2025 March 2025 April 2025 May 2025 June 2025 July 2025 August 2025 September 2025 October 2025 November 2025 December 2025 January 2026 February 2026 March 2026
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19
20
21
22
23
24
25
26
27
28
29
30
31
News Every Day |

Open source Mamba 3 arrives to surpass Transformer architecture with nearly 4% improved language modeling, reduced latency

The generative AI era began for most people with the launch of OpenAI's ChatGPT in late 2022, but the underlying technology — the "Transformer" neural network architecture that allows AI models to weigh the importance of different words in a sentence (or pixels in an image) differently and train on information in parallel — dates back to Google's seminal 2017 paper "Attention Is All You Need."

Yet while Transformers deliver unparalleled model quality and have underpinned most of the major generative AI models used today, they are computationally gluttonous. They are burdened by quadratic compute and linear memory demands that make large-scale inference an expensive, often prohibitive, endeavor. Hence, the desire by some researchers to improve on them by developing a new architecture, Mamba, in 2023, which has gone on to be included in hybrid Mamba-Transformer models like Nvidia's Nemotron 3 Super.

Now, the same researchers behind the original Mamba architecture including leaders Albert Gu of Carnegie Mellon and Tri Dao of Princeton have released the latest version of their new architecture, Mamba-3, as a language model under a permissive Apache 2.0 open source license — making it immediately available to developers, including enterprises for commercial purposes. A technical paper has also been published on arXiv.org.

This model signals a paradigm shift from training efficiency to an "inference-first" design. As Gu noted in the official announcement, while Mamba-2 focused on breaking pretraining bottlenecks, Mamba-3 aims to solve the "cold GPU" problem: the reality that during decoding, modern hardware often remains idle, waiting for memory movement rather than performing computation.

Perplexity (no, not the company) and the newfound efficiency of Mamba 3

Mamba, including Mamba 3, is a type of State Space Model (SSM).

These are effectively a high-speed "summary machine" for AI. While many popular models (like the ones behind ChatGPT) have to re-examine every single word they’ve already seen to understand what comes next—which gets slower and more expensive the longer the conversation lasts—an SSM maintains a compact, ever-changing internal state. This state is essentially a digital "mental snapshot" of the entire history of the data.

As new information flows in, the model simply updates this snapshot instead of re-reading everything from the beginning. This allows the AI to process massive amounts of information, like entire libraries of books or long strands of DNA, with incredible speed and much lower memory requirements.

To appreciate the leap Mamba-3 represents, one must first understand perplexity, the primary metric used in the research to measure model quality.

In the context of language modeling, perplexity is a measure of how "surprised" a model is by new data.

Think of a model as a professional gambler. If a model has high perplexity, it is unsure where to place its bets; it sees many possible next words as equally likely.

A lower perplexity score indicates that the model is more "certain"—it has a better grasp of the underlying patterns of human language. For AI builders, perplexity serves as a high-fidelity proxy for intelligence.

The breakthrough reported in the Mamba-3 research is that it achieves comparable perplexity to its predecessor, Mamba-2, while using only half the state size. This means a model can be just as smart while being twice as efficient to run.

A new philosophy

The philosophy guiding Mamba-3 is a fundamental shift in how we think about AI "intelligence" versus the speed of the hardware it runs on. While the previous generation, Mamba-2, was designed to be trained at record-breaking speeds, Mamba-3 is an "inference-first" architecture — inference referring to the way AI models are served to end users, through websites like ChatGPT or Google Gemini, or through application programming interfaces (APIs).

Mamba 3's primary goal is to maximize every second the computer chip (GPU) is active, ensuring that the model is thinking as hard as possible without making the user wait for an answer.

In the world of language models, every point of accuracy is hard-won. At the 1.5-billion-parameter scale, the most advanced "MIMO" variant of Mamba-3 achieved a 57.6% average accuracy across benchmarks, representing a 2.2-percentage-point leap over the industry-standard Transformer.

While a two-point jump might sound modest, it actually represents a nearly 4% relative increase in language modeling capability compared to the Transformer baseline. Even more impressively, as alluded to above, Mamba-3 can match the predictive quality of its predecessor while using only half the internal "state size," effectively delivering the same level of intelligence with significantly less memory lag.

For years, efficient alternatives to Transformers suffered from a "logic gap"—they often failed at simple reasoning tasks, like keeping track of patterns or solving basic arithmetic, because their internal math was too rigid. Mamba-3 solves this by introducing complex-valued states.

This mathematical upgrade acts like an internal compass, allowing the model to represent "rotational" logic. By using this "rotary" approach, Mamba-3 can near-perfectly solve logic puzzles and state-tracking tasks that its predecessors could only guess at, finally bringing the reasoning power of linear models on par with the most advanced systems.

The final piece of the puzzle is how Mamba-3 interacts with physical hardware. Most AI models today are "memory-bound," meaning the computer chip spends most of its time idle, waiting for data to move from memory to the processor.

Mamba-3 introduces a Multi-Input, Multi-Output (MIMO) formulation that fundamentally changes this dynamic. By performing up to four times more mathematical operations in parallel during each step, Mamba-3 utilizes that previously "idle" power. This allows the model to do significantly more "thinking" for every word it generates without increasing the actual time a user spends waiting for a response. More on these below.

Three new technological leaps

The appeal of linear models has always been their constant memory requirements and linear compute scaling.

However, as the Mamba 3 authors point out, there is "no free lunch". By fixing the state size to ensure efficiency, these models are forced to compress all historical context into a single representation—the exact opposite of a Transformer’s ever-growing KV cache. Mamba-3 pulls three specific levers to make that fixed state do more work.

1. Exponential-Trapezoidal Discretization

State Space Models are fundamentally continuous-time systems that must be "discretized" to handle the discrete sequences of digital data.

Previous iterations relied on "Exponential-Euler" discretization—a heuristic that provided only a first-order approximation of the system.

Mamba-3 introduces a generalized trapezoidal rule, providing second-order accurate approximation. This isn't just a mathematical refinement; it induces an "implicit convolution" within the core recurrence.

By combining this with explicit B and C bias terms, the researchers were able to remove the short causal convolution that has been a staple of recurrent architectures for years.

2. Complex-Valued SSMs and the "RoPE Trick"

One of the most persistent criticisms of linear models has been their inability to solve simple state-tracking tasks, such as determining the parity of a bit sequence.

This failure stems from restricting the transition matrix to real numbers, which prevents the model from representing "rotational" dynamics.Mamba-3 overcomes this by viewing the underlying SSM as complex-valued.

Using what the team calls the "RoPE trick," they demonstrate that a complex-valued state update is mathematically equivalent to a data-dependent rotary embedding (RoPE) applied to the input and output projections.

This allows Mamba-3 to solve synthetic reasoning tasks that were impossible for Mamba-2.

3. MIMO: Boosting Arithmetic Intensity

The most significant leap in inference efficiency comes from the transition from Single-Input, Single-Output (SISO) to Multi-Input, Multi-Output (MIMO) SSMs.

In a standard SSM, the state update is an outer-product operation that is heavily memory-bound.By switching to a matrix-multiplication-based state update, Mamba-3 increases the "arithmetic intensity" of the model—the ratio of FLOPs to memory traffic.

This allows the model to perform more computation during the memory-bound decoding phase. Essentially, Mamba-3 utilizes the "idle" compute cores of the GPU to increase model power for "free," maintaining the same decoding speed as its simpler predecessors.

What Mamba 3 means for enterprises and AI builders

For enterprises, Mamba-3 represents a strategic shift in the total cost of ownership (TCO) for AI deployments.

  • Cost vs. Performance: By matched-parameter performance, Mamba-3 (MIMO) matches the perplexity of Mamba-2 while using half the state size. For enterprise deployment, this effectively doubles the inference throughput for the same hardware footprint.

  • Agentic Workflows: As organizations move toward parallel, agentic workflows (like automated coding or real-time customer service agents), the demand for low-latency generation increases exponentially. Mamba-3 is designed specifically to prevent GPU hardware from sitting "cold" during these tasks.

  • The Hybrid Advantage: The researchers predict that the future of enterprise AI lies in hybrid models. By interleaving Mamba-3 with self-attention, organizations can combine the efficient "memory" of SSMs with the precise "database" storage of Transformers.

Availability, licensing, and usage

Mamba-3 is not merely a theoretical research paper; it is a fully realized, open-source release available for immediate use with model code published on Github.

The project is released under the Apache-2.0 License. This is a permissive, business-friendly license that allows for free usage, modification, and commercial distribution without requiring the disclosure of proprietary source code.

This release is good for developers building long-context applications, real-time reasoning agents, or those seeking to reduce GPU costs in high-volume production environments.

Leading the State Space Models (SSM) revolution

The release was met with enthusiasm on social media, particularly regarding the "student-led" nature of the project. Gu, whose X/Twitter bio describes him as "leading the ssm revolution," gave full credit to the student leads, including Aakash Lahoti and Kevin Y. Li

.Gu’s thread highlighted the team’s satisfaction with the design:

"We’re quite happy with the final model design! The three core methodological changes are inspired by (imo) some elegant math and methods."

As agentic workflows push inference demand "through the roof," the arrival of Mamba-3 suggests that the future of AI may not just be about having the biggest model, but about having the most efficient one.

Mamba-3 has successfully re-aligned the SSM with the realities of modern hardware, proving that even in the age of the Transformer, the principles of classical control theory still have a vital role to play.

Ria.city






Read also

Stratton on track to succeed Durbin: Key takeaways from Illinois primary elections

Microsoft Pushes Toward ‘Medical Superintelligence’ in Healthcare

‘Favourites tag doesn’t come easy’: Shubman Gill’s bold Team India message

News, articles, comments, with a minute-by-minute update, now on Today24.pro

Today24.pro — latest news 24/7. You can add your news instantly now — here




Sports today


Новости тенниса


Спорт в России и мире


All sports news today





Sports in Russia today


Новости России


Russian.city



Губернаторы России









Путин в России и мире







Персональные новости
Russian.city





Friends of Today24

Музыкальные новости

Персональные новости