Add news
March 2010 April 2010 May 2010 June 2010 July 2010
August 2010
September 2010 October 2010 November 2010 December 2010 January 2011 February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024 May 2024 June 2024 July 2024 August 2024 September 2024 October 2024 November 2024 December 2024 January 2025 February 2025 March 2025 April 2025 May 2025 June 2025 July 2025 August 2025 September 2025 October 2025 November 2025 December 2025 January 2026 February 2026
1 2 3 4 5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
News Every Day |

Mistral drops Voxtral Transcribe 2, an open-source speech model that runs on-device for pennies

Mistral AI, the Paris-based startup positioning itself as Europe's answer to OpenAI, released a pair of speech-to-text models on Wednesday that the company says can transcribe audio faster, more accurately, and far more cheaply than anything else on the market — all while running entirely on a smartphone or laptop.

The announcement marks the latest salvo in an increasingly competitive battle over voice AI, a technology that enterprise customers see as essential for everything from automated customer service to real-time translation. But unlike offerings from American tech giants, Mistral's new Voxtral Transcribe 2 models are designed to process sensitive audio without ever transmitting it to remote servers — a feature that could prove decisive for companies in regulated industries like healthcare, finance, and defense.

"You'd like your voice and the transcription of your voice to stay close to where you are, meaning you want it to happen on device—on a laptop, a phone, or a smartwatch," Pierre Stock, Mistral's vice president of science operations, said in an interview with VentureBeat. "We make that possible because the model is only 4 billion parameters. It's small enough to fit almost anywhere."

Mistral splits its new AI transcription technology into batch processing and real-time applications

Mistral released two distinct models under the Voxtral Transcribe 2 banner, each engineered for different use cases.

  • Voxtral Mini Transcribe V2 handles batch transcription, processing pre-recorded audio files in bulk. The company says it achieves the lowest word error rate of any transcription service and is available via API at $0.003 per minute, roughly one-fifth the price of major competitors. The model supports 13 languages, including English, Mandarin Chinese, Japanese, Arabic, Hindi, and several European languages.

  • Voxtral Realtime, as its name suggests, processes live audio with a latency that can be configured down to 200 milliseconds — the blink of an eye. Mistral claims this is a breakthrough for applications where even a two-second delay proves unacceptable: live subtitling, voice agents, and real-time customer service augmentation.

The Realtime model ships under an Apache 2.0 open-source license, meaning developers can download the model weights from Hugging Face, modify them, and deploy them without paying Mistral a licensing fee. For companies that prefer not to run their own infrastructure, API access costs $0.006 per minute.

Stock said Mistral is betting on the open-source community to expand the model's reach. "The open-source community is very imaginative when it comes to applications," he said. "We're excited to see what they're going to do."

Why on-device AI processing matters for enterprises handling sensitive data

The decision to engineer models small enough to run locally reflects a calculation about where the enterprise market is heading. As companies integrate AI into ever more sensitive workflows — transcribing medical consultations, financial advisory calls, legal depositions — the question of where that data travels has become a dealbreaker.

Stock painted a vivid picture of the problem during his interview. Current note-taking applications with audio capabilities, he explained, often pick up ambient noise in problematic ways: "It might pick up the lyrics of the music in the background. It might pick up another conversation. It might hallucinate from a background noise."

Mistral invested heavily in training data curation and model architecture to address these issues. "All of that, we spend a lot of time ironing out the data and the way we train the model to robustify it," Stock said.

The company also added enterprise-specific features that its American competitors have been slower to implement. Context biasing allows customers to upload a list of specialized terminology — medical jargon, proprietary product names, industry acronyms — and the model will automatically favor those terms when transcribing ambiguous audio. Unlike fine-tuning, which requires retraining the model, context biasing works through a simple API parameter.

"You only need a text list," Stock explained. "And then the model will automatically bias the transcription toward these acronyms or these weird words. And it's zero shots, no need for retraining, no need for weird stuff."

From factory floors to call centers, Mistral targets high-noise industrial environments

Stock described two scenarios that capture how Mistral envisions the technology being deployed.

The first involves industrial auditing. Imagine technicians walking through a manufacturing facility, inspecting heavy machinery while shouting observations over the din of factory noise. "In the end, imagine like a perfect timestamped notes identifying who said what — so diarization — while being super robust," Stock said. The challenge is handling what he called "weird technical language that no one is able to spell except these people."

The second scenario targets customer service operations. When a caller contacts a support center, Voxtral Realtime can transcribe the conversation in real time, feeding text to backend systems that pull up relevant customer records before the caller finishes explaining the problem.

"The status will appear for the operator on the screen before the customer stops the sentence and stops complaining," Stock explained. "Which means you can just interact and say, 'Okay, I can see the status. Let me correct the address and send back the shipment.'"

He estimated this could reduce typical customer service interactions from multiple back-and-forth exchanges to just two interactions: the customer explains the problem, and the agent resolves it immediately.

Real-time translation across languages could arrive by the end of 2026

For all the focus on transcription, Stock made clear that Mistral views these models as foundational technology for a more ambitious goal: real-time speech-to-speech translation that feels natural.

"Maybe the end goal application and what the model is laying the groundwork for is live translation," he said. "I speak French, you speak English. It's key to have minimal latency, because otherwise you don't build empathy. Your face is not out of sync with what you said one second ago."

That goal puts Mistral in direct competition with Apple and Google, both of which have been racing to solve the same problem. Google's latest translation model operates at a two-second delay — ten times slower than what Mistral claims for Voxtral Realtime.

Mistral positions itself as the privacy-first alternative for enterprise customers

Mistral occupies an unusual position in the AI landscape. Founded in 2023 by alumni of Meta and Google DeepMind, the company has raised over $2 billion and now carries a valuation of approximately $13.6 billion. Yet it operates with a fraction of the compute resources available to American hyperscalers — and has built its strategy around efficiency rather than brute force.

"The models we release are enterprise grade, industry leading, efficient — in particular, in terms of cost — can be embedded into the edge, unlocks privacy, unlocks control, transparency," Stock said.

That approach has resonated particularly with European customers wary of dependence on American technology. In January, France's Ministry of the Armed Forces signed a framework agreement giving the country's military access to Mistral's AI models—a deal that explicitly requires deployment on French-controlled infrastructure.

"I think a big barrier to adoption of voice AI is that, hey, if you're in a sensitive industry like finance or in manufacturing or healthcare or insurance, you can't have information you're talking about just go to the cloud," Howard Cohen, who participated in the interview alongside Stock, noted. "It needs to be either on device or needs to be on your premise."

Mistral faces stiff competition from OpenAI, Google, and a rising China

The transcription market has grown fiercely competitive. OpenAI's Whisper model has become something of an industry standard, available both through API and as downloadable open-source weights. Google, Amazon, and Microsoft all offer enterprise-grade speech services. Specialized players like Assembly AI and Deepgram have built substantial businesses serving developers who need reliable, scalable transcription.

Mistral claims its new models outperform all of them on accuracy benchmarks while undercutting them on price. "We are better than them on the benchmarks," Stock said. Independent verification of those claims will take time, but the company points to performance on FLEURS, a widely used multilingual speech benchmark, where Voxtral models achieve word error rates competitive with or superior to alternatives from OpenAI and Google.

Perhaps more significantly, Mistral's CEO Arthur Mensch has warned that American AI companies face pressure from an unexpected direction. Speaking at the World Economic Forum in Davos last month, Mensch dismissed the notion that Chinese AI lags behind the West as "a fairy tale."

"The capabilities of China's open-source technology is probably stressing the CEOs in the US," he said.

The French startup bets that trust will determine the winner in enterprise voice AI

Stock predicted that 2026 would be "the year of note-taking" — the moment when AI transcription becomes reliable enough that users trust it completely.

"You need to trust the model, and the model basically cannot make any mistake, otherwise you would just lose trust in the product and stop using it," he said. "The threshold is super, super hard."

Whether Mistral has crossed that threshold remains to be seen. Enterprise customers will be the ultimate judges, and they tend to move slowly, testing claims against reality before committing budgets and workflows to new technology. The audio playground in Mistral Studio, where developers can test Voxtral Transcribe 2 with their own files, went live today.

But Stock's broader argument deserves attention. In a market where American giants compete by throwing billions of dollars at ever-larger models, Mistral is making a different wager: that in the age of AI, smaller and local might beat bigger and distant. For the executives who spend their days worrying about data sovereignty, regulatory compliance, and vendor lock-in, that pitch may prove more compelling than any benchmark.

The race to dominate enterprise voice AI is no longer just about who builds the most powerful model. It's about who builds the model you're willing to let listen.

Ria.city






Read also

“I’m a Team Player” – Senator David Perdue Grand Jury Testimony Claims Gov. Brian Kemp Stopped 2020 Election Investigation

Child hospitalized after gun goes off in Anne Arundel Co. elementary school

Disgraced Former Prince Andrew Finally Evicted From Royal Lodge Mansion in the Dead of the Night, Amid More Damaging Revelations in the Epstein Files

News, articles, comments, with a minute-by-minute update, now on Today24.pro

Today24.pro — latest news 24/7. You can add your news instantly now — here




Sports today


Новости тенниса


Спорт в России и мире


All sports news today





Sports in Russia today


Новости России


Russian.city



Губернаторы России









Путин в России и мире







Персональные новости
Russian.city





Friends of Today24

Музыкальные новости

Персональные новости