For much of the past decade, progress in artificial intelligence has been measured by how fluently machines handle language. In her onstage conversation on Tuesday (Feb. 3) with Jeetu Patel, chief product officer at Cisco, World Labs CEO Fei-Fei Li argued that this focus is now running into a harder constraint: systems that reason well in text still lack a reliable understanding of the physical world they are increasingly asked to act in.
As AI systems move closer to execution rather than analysis, Li said, their limitations are becoming less about reasoning in text and more about understanding and acting within physical environments. “The ability to understand, to reason, to interact with and to navigate the real 3D, 4D physical world is the foundation,” she said, describing spatial intelligence as essential for systems that operate beyond screens.
AI that cannot model space, geometry, and physical cause and effect cannot reliably support decisions in robotics, design, simulation, healthcare or logistics. As those applications move from pilots into production, the cost of error increases sharply.
Why Spatial Models Matter
Li drew a clear distinction between visual generation and spatial understanding. Many current models can generate convincing images or video without maintaining consistency across time or perspective. Spatial models, by contrast, must preserve structure. They need to support navigation, interaction and repeatability.
At World Labs, the company Li co-founded, that distinction shapes its first product. As reported by PYMNTS, Marble, a first-generation spatial model, takes multimodal inputs such as text, images, video or simple 3D prompts and generates environments that can be navigated and interacted with. Li described it as early and limited, but functionally different from video models because it maintains geometric consistency.
That capability enables practical uses. Li cited adoption by game developers and virtual production teams who need environments rather than static assets. Robotics groups are using spatial models as training and simulation environments, where systems can test actions before deployment. Architects and designers are applying them to spatial planning.
More unexpectedly, Li said clinical researchers have shown interest in using generated environments for mental health research, where physical context matters but is difficult to control in the real world. Across these examples, the value lies in reducing the cost and risk of experimentation. Spatial models allow organizations to test scenarios digitally that would be expensive, slow, or unsafe to recreate physically.
Data Constraints Shape the Pace of Progress
Li was explicit about why spatial AI will not scale at the pace seen in language models. Text data is plentiful, standardized and easily observable. Physical data is none of those things.
World models rely on hybrid data strategies that combine internet-scale images and video, simulated data and carefully captured real-world data. Li compared the approach to autonomous vehicle development, where companies spent years collecting real and simulated driving data before achieving limited commercial deployment.
As a result, spatial models today are smaller and trained on fewer compute resources than frontier language models. That gap reflects both data scarcity and the relative youth of the field. “Just because the North Star is clear doesn’t mean the journey is short,” Li said, emphasizing that progress depends on architecture, data quality and simulation fidelity rather than scale alone.
She applied the same caution to robotics. General-purpose robots remain a long-term objective, but Li described manipulation and dexterity as unresolved challenges. Unlike autonomous vehicles, which operate on constrained surfaces and avoid contact, robots must interact with objects in three dimensions without damaging them. Training data for those tasks remains limited.
A Different Constraint Than Compute
Li’s remarks challenge narratives that frame AI progress primarily through compute scale and capital investment.
She pointed to a different constraint. If artificial intelligence systems are expected to act in the physical world, then data availability, simulation accuracy and domain-specific knowledge may matter as much as parameter counts. Scale alone does not resolve the problem of understanding space.
Near-term value from spatial AI will come from targeted deployments where environments are well-defined and errors are manageable. The broader promise remains intact, but timelines depend less on ambition than on data and execution.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.