Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)
Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)
Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)
Shownote
Shownote
Hamel Husain and Shreya Shankar teach the world’s most popular course on AI evals and have trained over 2,000 PMs and engineers (including many teams at OpenAI and Anthropic). In this conversation, they demystify the process of developing effective evals, ...
Highlights
Highlights
In this insightful conversation, Hamel Husain and Shreya Shankar break down the essential practice of AI evaluations, emphasizing their role as a foundational discipline for building effective AI-powered products. They move beyond buzzwords to reveal how structured, human-led evals enable teams to systematically understand model behavior, uncover hidden failure modes, and drive meaningful product improvements.
Chapters
Chapters
Introduction to Hamel and Shreya
00:00What are evals?
04:57Demo: Examining real traces from a property management AI assistant
09:56Writing notes on errors
16:51Why LLMs can’t replace humans in the initial error analysis
23:54The concept of a “benevolent dictator” in the eval process
25:16Theoretical saturation: when to stop
28:07Using axial codes to help categorize and synthesize error notes
31:39The results
44:39Building an LLM-as-judge to evaluate specific failure modes
46:06The difference between code-based evals and LLM-as-judge
48:31Example: LLM-as-judge
52:10Testing your LLM judge against human judgment
54:45Why evals are the new PRDs for AI products
1:00:51How many evals you actually need
1:05:09What comes after evals
1:07:41The great evals debate
1:09:57Why dogfooding isn’t enough for most AI products
1:15:15OpenAI’s Statsig acquisition
1:18:23Tips and tricks for implementing evals effectively
1:22:28The Claude Code controversy and the importance of context
1:23:02Common misconceptions around evals
1:24:13The time investment
1:30:37Overview of their comprehensive evals course
1:33:38Lightning round and final thoughts
1:37:57Transcript
Transcript
Lenny Rachitsky: To build great AI products, you need to be really good at building evals.
Hamel Husain: It's the highest ROI activity you can engage in. This process is a lot of fun. Everyone that does this immediately gets addicted to it. When you're bu...
