scripod.com

The Frontier of Spatial Intelligence with Fei-Fei Li

As AI evolves beyond language and into the physical world, pioneers Fei-Fei Li and Justin Johnson are leading a transformative shift toward spatial intelligence. Their work, rooted in decades of research, is now materializing through World Labs and its groundbreaking product, Marble. This conversation explores how the foundations of modern AI are being extended to understand and generate 3D environments, opening new frontiers across technology and industry.
The discussion traces the evolution of AI from the early days of deep learning and the creation of ImageNet to today’s generative models and spatial intelligence. Fei-Fei Li and Justin Johnson reflect on key algorithmic and computational advances—such as Transformers, diffusion models, and neural radiance fields (like NeRF)—that have enabled machines to move beyond 2D image analysis. They emphasize the limitations of language-centric models in capturing 3D reality and advocate for AI systems that perceive, reason, and generate dynamic spatial environments. World Labs’ mission centers on building Large World Models that simulate real-world geometry, physics, and semantics. With applications in AR, VR, robotics, and creative tools like Marble, their vision aims to bridge virtual and physical worlds. Success is measured not just by technical milestones but by enabling intuitive, immersive interactions with AI-generated 3D spaces.
05:07
05:07
The fundamental importance of data in AI was realized during early research at Caltech.
08:03
08:03
Deep learning was unlocked by both increased compute and access to large labeled datasets.
12:00
12:00
Neural artistic style conversion inspired breakthroughs in generative AI
16:21
16:21
The North Star is to unlock spatial intelligence
18:36
18:36
The right time to build spatial intelligence is now, driven by smartphone camera data.
21:15
21:15
Ben Mildenhall's NERF paper revolutionized 3D reconstruction with efficient neural rendering.
23:15
23:15
Reconstruction and generation in computer vision are converging through NERF and diffusion models
27:36
27:36
3D world representation better aligns with physical reality than 2D video or token-based models
29:04
29:04
Spatial intelligence is the North Star for building intelligent systems that understand 3D worlds.
40:00
40:00
Ben Mildenhall and Christoph Lassner are recognized as legends in the field of spatial intelligence.
41:57
41:57
Fei-Fei Li and Justin Johnson solved a long-term goal in spatial intelligence