March 2010 April 2010 May 2010 June 2010 July 2010
August 2010
September 2010 October 2010 November 2010 December 2010 January 2011 February 2011 March 2011 April 2011 May 2011 June 2011 July 2011 August 2011 September 2011 October 2011 November 2011 December 2011 January 2012 February 2012 March 2012 April 2012 May 2012 June 2012 July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 April 2013 May 2013 June 2013 July 2013 August 2013 September 2013 October 2013 November 2013 December 2013 January 2014 February 2014 March 2014 April 2014 May 2014 June 2014 July 2014 August 2014 September 2014 October 2014 November 2014 December 2014 January 2015 February 2015 March 2015 April 2015 May 2015 June 2015 July 2015 August 2015 September 2015 October 2015 November 2015 December 2015 January 2016 February 2016 March 2016 April 2016 May 2016 June 2016 July 2016 August 2016 September 2016 October 2016 November 2016 December 2016 January 2017 February 2017 March 2017 April 2017 May 2017 June 2017 July 2017 August 2017 September 2017 October 2017 November 2017 December 2017 January 2018 February 2018 March 2018 April 2018 May 2018 June 2018 July 2018 August 2018 September 2018 October 2018 November 2018 December 2018 January 2019 February 2019 March 2019 April 2019 May 2019 June 2019 July 2019 August 2019 September 2019 October 2019 November 2019 December 2019 January 2020 February 2020 March 2020 April 2020 May 2020 June 2020 July 2020 August 2020 September 2020 October 2020 November 2020 December 2020 January 2021 February 2021 March 2021 April 2021 May 2021 June 2021 July 2021 August 2021 September 2021 October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022 August 2022 September 2022 October 2022 November 2022 December 2022 January 2023 February 2023 March 2023 April 2023 May 2023 June 2023 July 2023 August 2023 September 2023 October 2023 November 2023 December 2023 January 2024 February 2024 March 2024 April 2024 May 2024 June 2024 July 2024 August 2024 September 2024 October 2024 November 2024 December 2024 January 2025
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
News Every Day |

Why AI Progress Is Increasingly Invisible

OpenAI co-founder Ilya Sutskever made waves in November when he suggested that advancements in AI are slowing down, explaining that simply scaling up AI models was no longer delivering proportional performance gains.

Sutskever’s comments came on the heels of reports in The Information and Bloomberg that Google and Anthropic were also experiencing similar slowdowns. This led to a wave of articles declaring that AI progress has hit a wall, lending further credence to an increasingly widespread feeling that chatbot capabilities haven’t improved significantly since OpenAI released GPT-4 in March 2023.

[time-brightcove not-tgx=”true”]

On Dec. 20, OpenAI announced o3, its latest model, and reported new state-of-the-art performance on a number of the most challenging technical benchmarks out there, in many cases improving on the previous high score by double-digit percentage points. I believe that o3 signals that we are in a new paradigm of AI progress. And François Chollet a co-creator of the prominent ARC-AGI benchmark, who some consider to be an AI scaling skeptic, writes that the model represents a “genuine breakthrough.”

However, in the weeks after OpenAI announced o3, many mainstream news sites made no mention of the new model. Around the time of the announcement, readers would find headlines at the Wall Street Journal, WIRED, and the New York Times suggesting AI was actually slowing down. The muted media response suggests that there is a growing gulf between what AI insiders are seeing and what the public is told.

Indeed, AI progress hasn’t stalled—it’s just become invisible to most people.

Automating behind-the-scenes research

First, AI models are getting better at answering complex questions. For example, in June 2023, the best AI model barely scored better than chance on the hardest set of “Google-proof” PhD-level science questions. In September, OpenAI’s o1 model became the first AI system to surpass the scores of human domain experts. And in December, OpenAI’s o3 model improved on those scores by another 10%. 

However, the vast majority of people won’t notice this kind of improvement because they aren’t doing graduate-level science work. But it will be a huge deal if AI starts meaningfully accelerating research and development in scientific fields, and there is some evidence that such an acceleration is already happening. A groundbreaking paper by Aidan Toner-Rodgers at MIT recently found that material scientists assisted by AI systems “discover 44% more materials, resulting in a 39% increase in patent filings and a 17% rise in downstream product innovation.” Still, 82% of scientists report that the AI tools reduced their job satisfaction, mainly citing “skill underutilization and reduced creativity.”

But the Holy Grail for AI companies is a system that can automate AI research itself, theoretically enabling an explosion in capabilities that drives progress across every other domain. The recent improvements made on this front may be even more dramatic than those made on hard sciences. 

In an attempt to provide more realistic tests of AI programming capabilities, researchers developed SWE-Bench, a benchmark that evaluates how well AI agents can fix actual open problems in popular open-source software. The top score on the verified benchmark a year ago was 4.4%. The top score today is closer to 72%, achieved by OpenAI’s o3 model.

This remarkable improvement—from struggling with even the simplest fixes to successfully handling nearly three-quarters of the set of real-world coding tasks—suggests AI systems are rapidly gaining the ability to understand and modify complex software projects. This marks a crucial step toward automating significant portions of software research and development. And this process appears to be well underway. Google’s CEO recently told investors that “more than a quarter of all new code at Google is generated by AI.”

Much of this progress has been driven by improvements to the “scaffolding” built around AI models like GPT-4o, which increase their autonomy and ability to interact with the world. Even without further improvements to base models, better scaffolding can make AI significantly more capable and agentic: a word researchers use to describe an AI model that can act autonomously, make decisions, and adapt to changing circumstances. AI agents are often given the ability to use tools and take multi-step actions on a user’s behalf. Transforming passive chatbots into agents has only become a core focus of the industry in the last year, and progress has been swift. 

Perhaps the best head-to-head matchup of elite engineers and AI agents was published in November by METR, a leading AI evaluations group. The researchers created novel, realistic, challenging, and unconventional machine learning tasks to compare human experts and AI agents. While the AI agents beat human experts at two hours of equivalent work, the median engineer won at longer time scales.

But even at eight hours, the best AI agents still managed to beat well over one-third of the human experts. The METR researchers emphasized that there was a “relatively limited effort to set up AI agents to succeed at the tasks, and we strongly expect better elicitation to result in much better performance on these tasks.” They also highlighted how much cheaper the AI agents were than their human counterparts.

The problem with invisible innovation

The hidden improvements in AI over the last year may not represent as big a leap in overall performance as the jump between GPT-3.5 and GPT-4. And it is possible we don’t see a jump that big ever again. But the narrative that there hasn’t been much progress since then is undermined by significant under-the-radar advancements. And this invisible progress could leave us dangerously unprepared for what is to come. 

The big risk is that policymakers and the public tune out this progress because they can’t see the improvements first-hand. Everyday users will still encounter frequent hallucinations and basic reasoning failures, which also get triumphantly amplified by AI skeptics. These obvious errors make it easy to dismiss AI’s rapid advancement in more specialized domains. 

There’s a common view in the AI world, shared by both proponents and opponents of regulation, that the U.S. federal government won’t mandate guardrails on the technology unless there’s a major galvanizing incident. Such an incident, often called a “warning shot,” could be innocuous, like a credible demonstration of dangerous AI capabilities that doesn’t harm anyone. But it could also take the form of a major disaster caused or enabled by an AI system, or a society upended by devastating labor automation. 

The worst-case scenario is that AI systems become scary powerful but no warning shots are fired (or heeded) before a system permanently escapes human control and acts decisively against us.

Last month, Apollo Research, an evaluations group that works with top AI companies, published evidence that, under the right conditions, the most capable AI models were able to scheme against their developers and users. When given instructions to strongly follow a goal, the systems sometimes attempted to subvert oversight, fake alignment, and hide their true capabilities. In rare cases, systems engaged in deceptive behavior without nudging from the evaluators. When the researchers inspected the models’ reasoning, they found that the chatbots knew what they were doing, using language like “sabotage, lying, manipulation.”

This is not to say that these models are imminently about to conspire against humanity. But there has been a disturbing trend: as AI models get smarter, they get better at following instructions and understanding the intent behind their guidelines, but they also get better at deception. Smarter models may also be more likely to engage in dangerous behavior. For instance one of the world’s most capable models, OpenAI’s o1, was far more likely to double down on a lie after being caught by the Apollo evaluators. 

I fear that the gap between AI’s public face and its true capabilities is widening. While consumers see chatbots that still can’t count the letters in “strawberry,” researchers are documenting systems that can match PhD-level expertise and engage in sophisticated deception. This growing disconnect makes it harder for the public and policymakers to gauge AI’s real progress—progress they’ll need to understand to govern it appropriately. The risk isn’t that AI development has stalled; it’s that we’re losing our ability to track where it’s headed.

Москва

Певице Хибле Герзмава исполнилось 55 лет

Jhanak: Vihaan starts falling in love with Jhanak

Psychological Aspects of Interacting with Realistic Sex Dolls

The New St. Louis Hinder Club Opens

Exploring Top Realistic Sex Doll Brands

Ria.city






Read also

'Will lead to chaos': Panama Canal director claps back at Trump's fantasies

Maddie the guide dog helps blind NY student navigate her world

Rep. Rob Menendez won a competitive election for an open seat on the Energy and Commerce committee

News, articles, comments, with a minute-by-minute update, now on Today24.pro

News Every Day

The Evolution and Future of Realistic Sex Dolls

Today24.pro — latest news 24/7. You can add your news instantly now — here


News Every Day

Psychological Aspects of Interacting with Realistic Sex Dolls



Sports today


Новости тенниса
Арина Соболенко

Арина Соболенко раскрыла секрет своей игры после очередного титула в 2025 году



Спорт в России и мире
Москва

Московский «Спартак» разгромил СКА со счетом 5:0 в матче на «СКА-Арене»



All sports news today





Sports in Russia today

Москва

Команда Управления Росгвардии по Ульяновской области заняла призовое место в чемпионате по лыжным гонкам и служебному двоеборью


Новости России

Game News

Meta wants AI characters to fill up Facebook and Instagram 'kind of in the same way accounts do,' but also had to delete a humiliating first run of its official bots


Russian.city


ATP

Аделаида (ATP). 2-й круг. Пол сыграет с Гинаром, Оже-Альяссим – с Казо, Шаповалов встретится с Гироном, Корда – с Давидовичем-Фокина


Губернаторы России
Роскосмос

«Роскосмос» показал фото Вифлеема со спутника «Ресурс-П» в Рождество


Консультация юриста в Сургуте

Что важно учесть при обустройстве детской: советы эксперта

Консультация юриста в Сургуте по уголовным

McDonald’s собирается вернуться в Россию, пишет Telegram-канал «Москва сейчас»


Сергей Шнуров впервые о тяжелом недуге, который у него выявили в 43 года

Поэтам и Писателям любые возможности для творческого продвижения.

На концерте Сергея Трофимова в Пскове не осталось свободных мест

«Антиглянец»: экс-жена Сергея Шнурова Абрамова посетила концерт рокера на Бали


Директор Australian Open назвал главную конкурентку Арины Соболенко в Мельбурне

Российский теннисист Даниил Медведев сообщил о рождении второго ребенка

Аделаида (ATP). 1-й круг. Шаповалов встретится с Чжаном, Баутиста-Агут – с Давидовичем-Фокина, Коккинакис – с Нишиокой

Арина Соболенко раскрыла секрет своей игры после очередного титула в 2025 году



Музыкант Алексей Фомин представил инструментальный трек «Летняя гроза»

На 66-м километре МКАД произошло ДТП с двумя машинами

Поэтам и Писателям любые возможности для творческого продвижения.

AGON by AOC сохранил первое место в рейтинге игровых мониторов в 2024 году


Команда Управления Росгвардии по Ульяновской области заняла призовое место в чемпионате по лыжным гонкам и служебному двоеборью

LG ПРЕДСТАВЛЯЕТ СЕРИЮ OLED evo 2025 ГОДА С ВПЕЧАТЛЯЮЩЕЙ ЯРКОСТЬЮ И ПЕРСОНАЛИЗАЦИЕЙ НА ОСНОВЕ ИСКУССТВЕННОГО ИНТЕЛЛЕКТА

В аэропорту "Пулково" задерживают 22 рейса по различным причинам

Кабинет Артиста. Яндекс кабинет артиста. Яндекс музыка кабинет артиста.


В России выросли продажи развивающих игрушек

Овчинский: Стали известны даты приемов жителей столицы в Общественном штабе в январе

В правительстве России утвердили график праздничных выходных дней в 2025 году

«Подхожу к нему справа». Макару нужен слуховой аппарат за 550 тыс. рублей



Путин в России и мире






Персональные новости Russian.city
Пианист

Пианист на полиграфе. Петербуржец Николай Мажара сыграл первый концерт в Мальтийской капелле



News Every Day

Exploring Top Realistic Sex Doll Brands




Friends of Today24

Музыкальные новости

Персональные новости