scripod.com

After LLMs: Spatial Intelligence and World Models — Fei-Fei Li & Justin Johnson, World Labs

Shownote

Fei-Fei Li and Justin Johnson are cofounders of World Labs, who have recently launched Marble (https://marble.worldlabs.ai/), a new kind of generative “world model” that can create editable 3D environments from text, images, and other spatial inputs. Marbl...

Highlights

In this episode, Fei-Fei Li and Justin Johnson, pioneers in computer vision and AI, discuss their latest venture, World Labs, and its groundbreaking spatial intelligence model, Marble. Their conversation traces a journey from foundational work in image recognition to the next frontier of AI: understanding and generating 3D worlds.
10:29
There's a scaling limit in performance per watt from Hopper to Blackwell, opening room for new architectures.
18:04
Vision and language modeling might not be very different.
27:43
Things invented for fun can end up enabling serious breakthroughs, like the AI revolution from misused graphics chips.
31:40
Marble enables precise camera placement by understanding 3D space, unlike most video generative models.
33:29
Gaussian splats can be enhanced with physical properties for physics simulation.
36:20
Regenerating entire scenes allows more general interaction but is computationally expensive
40:32
Spatial intelligence involves the ability to reason, understand, move, and interact in space
54:17
Transformers are natively models of sets, not sequences, with permutation-equivariant operations.
57:04
Intellectual fearlessness is essential for pioneers in AI research.

Chapters

Introduction and the Fei-Fei Li & Justin Johnson Partnership
00:00
From ImageNet to World Models: The Evolution of Computer Vision
02:00
Dense Captioning and Early Vision-Language Work
12:33
Spatial Intelligence: Beyond Language Models
19:39
Introducing Marble: World Labs' First Spatial Intelligence Model
28:49
Gaussian Splats and the Technical Architecture of Marble
33:24
Physics, Dynamics, and the Future of World Models
35:50
Use Cases: From Creative Industries to Robotics and Embodied AI
37:37
Multimodality and the Interplay of Language and Space
44:00
Hiring, Research Directions, and the Future of World Labs
57:03

Transcript

Alessio: Hey everyone, welcome to the Latent Space podcast. This is Alessio, founder of Kernel Labs, and I'm joined by swyx, editor of Latent Space. And we are so excited to be in the studio with Fei-Fei and Justin of World Labs. Welcome. Fei-Fei Li: We'r...