A top AI researcher explains the limitations of current models
Welcome to AI Decoded, Fast Company’s weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week via email here.
François Chollet on AI benchmarks
I wrote an exclusive feature this week about the launch of a new AI benchmark called ARC-AGI-3. The benchmark was created by influential AI researcher Francois Chollet, who also created the widely-used Keras deep learning framework, a simplified toolkit for building AI models. Chollet has long argued that current AI models are limited in their ability to navigate novel situations and problems. The ARC test, which humans can master but not most AI systems, is designed to lay bare that limitation. My interview with Chollet contained a lot of general insights that didn’t make it into the story. Here are some of them (with my annotations in bold.)
ARC-AGI-3 asks AI agents to navigate a series of simplistic video games, without instructions. Here’s Chollet on why current models struggle to do that:
“It’s because they are reliant on memorization and retrieval, and the game is something they’ve never seen before. They’ve never played that particular game before or games like it, because each one is unique. So they’re lost. But a human is generally intelligent. A human is never lost. A human figures it out on the fly because they have fluid intelligence.”
I began to imagine how I would approach figuring out the games. I suggested to Chollet that my main strategy would be thinking about similar scenarios I’d seen in the past or in other contexts, and trying to apply them.
“Models have a lot of abstractions encoded in them. They have in fact more knowledge than you do. But they have very low ability to recombine that knowledge at test time to make sense of something they’ve never seen before. It’s the way the entire paradigm works. We are really good at absorbing knowledge, absorbing lots and lots of patterns. Better than the human brain, and at a much bigger scale. We are very bad at fluid intelligence, which is taking those patterns and actually combining them on the fly to form a new model [of a problem].”
On what exactly an AI model would need to score highly on the benchmark:
“They need small amounts of world modeling and continual learning–continual learning being the idea that on one level you’re going to learn one concept, on the next level you’re going to reuse that concept but learn a new one, and on the third level you’re going to add a third concept, and so on. It’s continual learning.”
Not only do AI models need to continually learn, but they also need to form a model of the world that captures causes and effects. Chollet explains:
“In general, all the ingredients you need in order to solve ARC 3 the right way, without brute forcing, without training on millions of games, are the ingredients of human intelligence but on a very small scale. The control space is tiny, the sensor space is tiny, the mechanics of the worlds are very simple, and your learning time scale is very short. But it’s fundamentally about dealing with the unknown. You have to explore. You have to try things and then build, step by step, bit by bit, a causal model of what’s going on, like ‘what happened when I pressed this button?'”
Then you have to figure out what you want to be doing in this world. Like a child learning to move around. They have to figure out how their sense of space works, how the environment responds to what they’re doing. And when they start being able to do things, like crawling, they have to figure out what they want to be doing. Where do I want to crawl? If I can grab an object, why would I want to grab this object or that object?
On what would happen if we did incorporate these solutions and got close to a perfect score playing the games in ARC-AGI-3:
The causal models you need to build to solve these games are dramatically simpler than the causal models of the world that you have in your head. And the continual learning you have to do to solve one of these games is on the scale of a few minutes–five minutes, 10 minutes of gameplay. A human does decades of continual learning. So it’s the right ingredients at a very small scale. It’s a step in the right direction, but you cannot say this is human level.
OpenAI may have zapped Sora as part of a pivot toward ‘world modeling’
OpenAI has decided to shut down its Sora app, which lets users generate AI videos and then share them on its TikTok-style social feed. Its reason for doing so may dovetail with a growing trend among AI video generation players. The AI lab may be pivoting toward using its AI video generation technology for world modeling and simulation.
“As we focus and compute demand grows, the Sora research team continues to focus on world simulation research to advance robotics that will help people solve real-world, physical tasks,” an OpenAI spokesperson told Axios. That technology which can also be used in game development, digital twins, and special effects in visual entertainment. AI video generation companies Moonvalley and Runway AI are also moving toward developing world models.
OpenAI might also have been spooked by the obvious copyright infringement risk of apps like Sora. Many copyright holders, including Hollywood studios and actors, were shocked to see that Sora often used well-known faces and had no clear guardrails for controlling it. OpenAI responded by offering to give Hollywood studios and actors more control over their IP and likenesses on the platform.
Disney characters were among the first copyrighted assets to show up in Sora videos. But the two companies made a deal: Disney was invited to invest a billion dollars in OpenAI, and agreed to allow the use of classic Disney characters in Sora videos. The Hollywood Reporter now reports that the deal is off.
OpenAI will continue building its video generation models, and it’s possible that something like Sora will be added to ChatGPT.
More AI coverage from Fast Company:
- What happens when an AI agent decides to email you
- This Microsoft security team stress-tests AI for its worst-case scenarios
- Why breaking news still wins in the age of AI
- This artist’s work has been shown at MoMA. Now it’s training AI
Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.