Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)

Lenny's Podcast: Product | Career | Growth

Sep 25

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar (creators of the #1 eval course)

Lenny's Podcast: Product | Career | Growth

Sep 25

Overview Shownote Highlights Transcript Chapters Pins

In this insightful conversation, Hamel Husain and Shreya Shankar break down the essential practice of AI evaluations, emphasizing their role as a foundational discipline for building effective AI-powered products. They move beyond buzzwords to reveal how structured, human-led evals enable teams to systematically understand model behavior, uncover hidden failure modes, and drive meaningful product improvements.

The discussion outlines a step-by-step framework for creating impactful AI evals, starting with manual error analysis using real-world traces and open coding to identify recurring issues. The hosts stress that LLMs cannot replace human judgment in early-stage analysis due to context limitations. They introduce axial coding to categorize errors and recommend reaching theoretical saturation before scaling. The episode contrasts code-based evals—ideal for deterministic checks—with LLM-as-judge approaches for nuanced assessments, stressing the need to validate these judges against human consensus. Evals are positioned not as one-time tests but as evolving specifications that guide product development, surpassing traditional PRDs in dynamic AI systems. Common pitfalls include over-reliance on automation, skipping deep error analysis, and misinterpreting 'vibes' as validation. With strategic focus on high-impact failure modes and minimal ongoing effort post-setup, teams can implement robust evals efficiently, ensuring AI products deliver real user value.