Some thoughts on the Sutton interview

Dwarkesh Podcast

Oct 04

Overview Shownote Highlights Transcript Chapters Pins

This discussion explores the evolving relationship between imitation learning and reinforcement learning in the development of advanced AI systems, emphasizing how current large language models fit into broader trajectories toward artificial general intelligence.

Imitation learning and reinforcement learning (RL) are not competing paradigms but complementary stages in AI training. Imitation provides an efficient starting point using human-generated data, while RL refines behavior through environmental feedback. Pre-trained LLMs act as strong priors for RL, similar to how foundational technologies enable further innovation. Although today’s models lack true continual learning and suffer from low sample efficiency, techniques like test-time fine-tuning and in-context adaptation offer pathways toward more dynamic learning. Current systems absorb limited information per interaction compared to animals, highlighting a key gap. However, viewing imitation as short-horizon RL bridges conceptual divides. While LLMs begin with imitation and add RL later, biological evolution uses meta-RL to produce agents that can imitate—suggesting different routes to similar outcomes. Despite limitations such as dependence on static datasets and poor online learning, existing models already exhibit deep world knowledge. The path to AGI may involve hybrid approaches that integrate continuous adaptation, aligning ultimately with Richard Sutton’s vision of scalable, self-improving architectures.

02:43

Imitation learning is continuous and complementary to RL

03:22

Imitation learning can be seen as short-horizon reinforcement learning.

08:26

LLMs trained with outcome-based rewards learn very little per episode compared to biological learners.

10:33

If LLMs achieve AGI first, successor systems will likely be based on Richard Sutton's vision.