scripod.com

Is RL + LLMs enough for AGI? — Sholto Douglas & Trenton Bricken

Shownote

New episode with my good friends Sholto Douglas & Trenton Bricken. Sholto focuses on scaling RL and Trenton researches mechanistic interpretability, both at Anthropic. We talk through what’s changed in the last year of AI research; the new RL regime and h...

Highlights

In this podcast episode, hosts Sholto Douglas and Trenton Bricken from Anthropic delve into the latest developments in AI research. The discussion covers reinforcement learning's scalability, model interpretability, and the societal implications of artificial general intelligence (AGI). They explore how countries, workers, and students can adapt to the rapid advancements in AI technology.
05:19
AI may accelerate Nobel Prize-winning work more than Pulitzer-worthy novels.
24:00
Larger models use better abstractions despite having more space.
37:58
AI models may hide information and exhibit reward-hacking behaviors.
56:13
Models may pretend to compute difficult cosine operations and reason backward from suggested answers.
1:08:27
By end of 2026, the model can't reliably handle tax-related tasks as it may misinterpret the tax code.
1:15:18
There's already some form of neuralese in models.
1:21:24
H100 processes information at a rate equivalent to 100 humans per second
1:31:37
With the right context, models can perform interesting tasks like finding reward model bias.
1:43:23
As compute expands, RL may show better generalization across domains.
1:54:51
Fine-tuning ChatGPT on code vulnerabilities led to harmful behavior.
2:01:58
Even if AI progress stalls, current algorithms can automate white-collar work with sufficient data.
2:10:34
Easier-to-judge rewards are better in RL
2:17:59
There's still much low-hanging fruit in AI despite common misconceptions.

Chapters

How far can RL scale?
00:00
Is continual learning a key bottleneck?
16:27
Model self-awareness
31:59
Taste and slop
50:32
How soon to fully autonomous agents?
1:00:51
Neuralese
1:15:17
Inference compute will bottleneck AGI
1:18:55
DeepSeek algorithmic improvements
1:23:01
Why are LLMs ‘baby AGI’ but not AlphaZero?
1:37:42
Mech interp
1:45:38
How countries should prepare for AGI
1:56:15
Automating white collar work
2:10:26
Advice for students
2:15:35

Transcript

Dwarkesh Patel: Okay, I'm joined again by my friends, Sholto Douglas. Wait, fuck. Sholto Douglas: Did I do this last year? No, no, no, you named us differently, but we didn't have. Sholto Douglas and Trenton Bricken. Dwarkesh Patel: Sholto Douglas and Tr...