scripod.com

Latent Space: The AI Engineer Podcast

swyx + Alessio

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space
Nov 03
AI Ready

How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony

In this conversation with Quentin Anthony, Head of Model Training at Zyphra and advisor at EleutherAI, we explore the cutting-edge world of building foundation models on AMD hardware and the future of edge AI deployment. Quentin shares his journey from working on Oak Ridge National Lab's Frontier supercomputer to leading Zyphra's ambitious move to AMD MI300X GPUs, where they're achieving performance that beats NVIDIA H100s on certain workloads while dramatically reducing costs. The discussion dives deep into the technical challenges of kernel development, with Quentin explaining why he often bypasses high-level frameworks like Triton to write directly in ROCm or even GPU assembly when necessary. He reveals how Zyphra's hybrid transformer-Mamba models like Zamba 2 can match Llama 3 8B performance at 7B parameters, optimized specifically for edge deployment across a spectrum from 1.2B models for phones to 7B for desktops. Quentin candidly discusses his experience in the controversial Menlo software engineering productivity study, where he was one of the few developers who showed measurable speedup from AI tools. He shares practical insights on avoiding the "slot machine effect" of endlessly prompting models, the importance of context rot awareness, and why he prefers direct API access over tools like Cursor to maintain complete control over model context. The conversation also covers the state of open source AI research, with Quentin arguing that siloed, focused teams with guaranteed funding produce better results than grand collaborative efforts. He explains why kernel datasets alone won't solve the GPU programming problem, the challenges of evaluating kernel quality, and why companies should invest more in ecosystem development rather than traditional marketing. https://www.linkedin.com/in/quentin-anthony/ https://www.zyphra.com/post/zamba2-7b Key Topics: • AMD MI300X advantages: 192GB VRAM, superior memory bandwidth • Writing kernels from PTX/AMD GCN assembly up through CUDA/ROCm • Hybrid attention-Mamba architectures and optimal sparsity ratios • The Menlo productivity study: achieving positive AI speedup • Context rot and why shorter conversations beat long threads • Why physicists make great ML engineers ("embryonic stem cells") • Edge deployment strategies from phones to local clusters • The future of on-device vs cloud inference routing • EleutherAI's focus on interpretability with fully open pipelines • Building velocity-focused teams over position-based hiring
How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony
Oct 31
AI Ready

⚡️ Ship AI recap: Agents, Workflows, and Python — w/ Vercel CTO Malte Ubl

In this conversation with Malte Ubl, CTO of Vercel (http://x.com/cramforce), we explore how the company is pioneering the infrastructure for AI-powered development through their comprehensive suite of tools including workflows, AI SDK, and the newly announced agent ecosystem. Malte shares insights into Vercel's philosophy of "dogfooding" - never shipping abstractions they haven't battle-tested themselves - which led to extracting their AI SDK from v0 and building production agents that handle everything from anomaly detection to lead qualification. The discussion dives deep into Vercel's new Workflow Development Kit, which brings durable execution patterns to serverless functions, allowing developers to write code that can pause, resume, and wait indefinitely without cost. Malte explains how this enables complex agent orchestration with human-in-the-loop approvals through simple webhook patterns, making it dramatically easier to build reliable AI applications. We explore Vercel's strategic approach to AI agents, including their DevOps agent that automatically investigates production anomalies by querying observability data and analyzing logs - solving the recall-precision problem that plagues traditional alerting systems. Malte candidly discusses where agents excel today (meeting notes, UI changes, lead qualification) versus where they fall short, emphasizing the importance of finding the "sweet spot" by asking employees what they hate most about their jobs. The conversation also covers Vercel's significant investment in Python support, bringing zero-config deployment to Flask and FastAPI applications, and their vision for security in an AI-coded world where developers "cannot be trusted." Malte shares his perspective on how CTOs must transform their companies for the AI era while staying true to their core competencies, and why maintaining strong IC (individual contributor) career paths is crucial as AI changes the nature of software development. What was launched at Ship AI 2025: AI SDK 6.0 & Agent Architecture Agent Abstraction Philosophy: AI SDK 6 introduces an agent abstraction where you can "define once, deploy everywhere". How does this differ from existing agent frameworks like LangChain or AutoGPT? What specific pain points did you observe in production that led to this design? Human-in-the-Loop at Scale: The tool approval system with needsApproval: true gates actions until human confirmation. How do you envision this working at scale for companies with thousands of agent executions? What's the queue management and escalation strategy? Type Safety Across Models: AI SDK 6 promises "end-to-end type safety across models and UI". Given that different LLMs have varying capabilities and output formats, how do you maintain type guarantees when swapping between providers like OpenAI, Anthropic, or Mistral? Workflow Development Kit (WDK) Durability as Code: The use workflow primitive makes any TypeScript function durable with automatic retries, progress persistence, and observability. What's happening under the hood? Are you using event sourcing, checkpoint/restart, or a different pattern? Infrastructure Provisioning: Vercel automatically detects when a function is durable and dynamically provisions infrastructure in real-time. What signals are you detecting in the code, and how do you determine the optimal infrastructure configuration (queue sizes, retry policies, timeout values)? Vercel Agent (beta) Code Review Validation: The Agent reviews code and proposes "validated patches". What does "validated" mean in this context? Are you running automated tests, static analysis, or something more sophisticated? AI Investigations: Vercel Agent automatically opens AI investigations when it detects performance or error spikes using real production data. What data sources does it have access to? How does it distinguish between normal variance and actual anomalies? Python Support (For the first time, Vercel now supports Python backends natively.) Marketplace & Agent Ecosystem Agent Network Effects: The Marketplace now offers agents like CodeRabbit, Corridor, Sourcery, and integrations with Autonoma, Braintrust, Browser Use. How do you ensure these third-party agents can't access sensitive customer data? What's the security model? "An Agent on Every Desk" Program Vercel launched a new program to help companies identify high-value use cases and build their first production AI agents. It provides consultations, reference templates, and hands-on support to go from idea to deployed agent
⚡️ Ship AI recap: Agents, Workflows, and Python — w/ Vercel CTO Malte Ubl
Oct 31

⚡️ Ship AI recap: Agents, Workflows, and Python — w/ Vercel CTO Malte Ubl

In this conversation with Malte Ubl, CTO of Vercel, we explore how the company is pioneering the infrastructure for AI-powered development through their comprehensive suite of tools including workflows, AI SDK, and the newly announced agent ecosystem. Malte shares insights into Vercel's philosophy of "dogfooding" - never shipping abstractions they haven't battle-tested themselves - which led to extracting their AI SDK from v0 and building production agents that handle everything from anomaly detection to lead qualification. The discussion dives deep into Vercel's new Workflow Development Kit, which brings durable execution patterns to serverless functions, allowing developers to write code that can pause, resume, and wait indefinitely without cost. Malte explains how this enables complex agent orchestration with human-in-the-loop approvals through simple webhook patterns, making it dramatically easier to build reliable AI applications. We explore Vercel's strategic approach to AI agents, including their DevOps agent that automatically investigates production anomalies by querying observability data and analyzing logs - solving the recall-precision problem that plagues traditional alerting systems. Malte candidly discusses where agents excel today (meeting notes, UI changes, lead qualification) versus where they fall short, emphasizing the importance of finding the "sweet spot" by asking employees what they hate most about their jobs. The conversation also covers Vercel's significant investment in Python support, bringing zero-config deployment to Flask and FastAPI applications, and their vision for security in an AI-coded world where developers "cannot be trusted." Malte shares his perspective on how CTOs must transform their companies for the AI era while staying true to their core competencies, and why maintaining strong IC (individual contributor) career paths is crucial as AI changes the nature of software development. Key Topics: • The Workflow Development Kit and durable execution patterns • Building production agents for DevOps, lead qualification, and data analysis • Why AI SDK stayed low-level while competitors rushed to agent abstractions • The "agent on every desk" program and forward deployment strategy • Zero-config Python support and the future of multi-language platforms • Security models for AI-generated code that assumes developer incompetence • Finding the sweet spot for agent deployment through employee pain points • How Vercel's anomaly detection + agent investigation solves the SRE sleep problem • The importance of dogfooding and extracting abstractions from real usage • Leadership lessons for CTOs navigating the AI transformation
⚡️ Ship AI recap: Agents, Workflows, and Python — w/ Vercel CTO Malte Ubl
Oct 30
AI Ready

The Agents Economy Backbone - with Emily Glassberg Sands, Head of Data & AI at Stripe

Emily Glassberg Sands is the Head of Data & AI at Stripe where she leads the organization’s efforts to build financial infrastructure for the internet & leverage AI to power Stripe’s products. Stripe processes about $1.4 trillion in payments annually (~1.3% of global GDP), making it an exciting opportunity to apply AI & ML at scale. In this episode, Emily shares insights into how Stripe is using AI to solve complex problems like fraud detection, optimizing checkout experiences, & enabling new business models for AI companies. Emily also shares her economist perspective on market efficiency & how Stripe’s focus on building economic infrastructure for AI is driving growth across the ecosystem. We discuss: Stripe’s domain-specific foundation model and “payments embeddings” that run inline on the charge path to detect sophisticated card-testing at scale (improved detection rates at large users from ~59% to ~97%). The launch of the Agentic Commerce Protocol (ACP) with OpenAI, creating a shared standard for how businesses can expose products to AI agents which is used by Walmart and Sam’s Club. How Stripe is helping AI companies manage new fraud vectors, such as free trial and refund abuse, and the importance of real-time, outcome-based billing The impact of AI on Stripe’s internal operations, including the use of LLMs for code generation, merchant understanding, and internal tooling Why many AI companies are going global day-one how Stripe’s Link network (200M+ consumers) concentrates AI demand. Whether we're in an AI bubble, why GDP hasn't reflected AI productivity gains yet, and how agentic commerce could expand consumption by removing time constraints for high-income consumers Emily’s perspective on the changing social contract around AI, the importance of deep thinking, and the role of brand and design in AI-driven products — Where to find Emily Sands X: https://x.com/emilygsands LinkedIn: https://www.linkedin.com/in/egsands/ Where to find Shawn Wang X: https://x.com/swyx LinkedIn: https://www.linkedin.com/in/shawnswyxwang/ Where to find Alessio Fanelli X: https://x.com/FanaHOVA LinkedIn: https://www.linkedin.com/in/fanahova/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and Emily's Role at Stripe 00:09:55 AI Business Models and Fraud Challenges 00:13:49 Extending Radar for AI Economy 00:16:42 Payment Innovation: Token Billing and Stablecoins 00:23:09 Agentic Commerce Protocol Launch 00:29:40 Good Bots vs Bad Bots in AI 00:40:31 Designing the Agents Commerce Protocol 00:49:32 Internal AI Adoption at Stripe 01:04:53 Data Discovery and Text-to-SQL Challenges 01:21:00 AI Economy Analysis: Bubble or Boom?
The Agents Economy Backbone - with Emily Glassberg Sands, Head of Data & AI at Stripe
Oct 16

Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)

In this deep dive with Kyle Corbitt, co-founder and CEO of OpenPipe (recently acquired by CoreWeave), we explore the evolution of fine-tuning in the age of AI agents and the critical shift from supervised fine-tuning to reinforcement learning. Kyle shares his journey from leading YC's Startup School to building OpenPipe, initially focused on distilling expensive GPT-4 workflows into smaller, cheaper models before pivoting to RL-based agent training as frontier model prices plummeted. The conversation reveals why 90% of AI projects remain stuck in proof-of-concept purgatory - not due to capability limitations, but reliability issues that Kyle believes can be solved through continuous learning from real-world experience. He discusses the breakthrough of RULER (Relative Universal Reinforcement Learning Elicited Rewards), which uses LLMs as judges to rank agent behaviors relatively rather than absolutely, making RL training accessible without complex reward engineering. Kyle candidly assesses the challenges of building realistic training environments for agents, explaining why GRPO (despite its advantages) may be a dead end due to its requirement for perfectly reproducible parallel rollouts. He shares insights on why LoRAs remain underrated for production deployments, why GEPA and prompt optimization haven't lived up to the hype in his testing, and why the hardest part of deploying agents isn't the AI - it's sandboxing real-world systems with all their bugs and edge cases intact. The discussion also covers OpenPipe's acquisition by CoreWeave, the launch of their serverless reinforcement learning platform, and Kyle's vision for a future where every deployed agent continuously learns from production experience. He predicts that solving the reliability problem through continuous RL could unlock 10x more AI inference demand from projects currently stuck in development, fundamentally changing how we think about agent deployment and maintenance. Key Topics: The rise and fall of fine-tuning as a business model Why 90% of AI projects never reach production RULER: Making RL accessible through relative ranking The environment problem: Why sandboxing is harder than training GRPO vs PPO and the future of RL algorithms LoRAs: The underrated deployment optimization Why GEPA and prompt optimization disappointed in practice Building world models as synthetic training environments The $500B Stargate bet and OpenAI's potential crypto play Continuous learning as the path to reliable agents References https://www.linkedin.com/in/kcorbitt/ Aug 2023  https://openpipe.ai/blog/from-prompts-to-models  DEC 2023 https://openpipe.ai/blog/mistral-7b-fine-tune-optimized JAN 2024 https://openpipe.ai/blog/s-lora MAY 2024 https://openpipe.ai/blog/the-ten-commandments-of-fine-tuning-in-prod   https://www.youtube.com/watch?v=-hYqt8M9u_M Oct 2024 https://openpipe.ai/blog/announcing-dpo-support  AIE NYC 2025 Finetuning 500m agents https://www.youtube.com/watch?v=zM9RYqCcioM&t=919s AIEWF 2025 How to train your agent (ART-E) https://www.youtube.com/watch?v=gEDl9C8s_-4&t=216s SEPT 2025 ACQUISTION https://openpipe.ai/blog/openpipe-coreweave  W&B Serverless RL https://openpipe.ai/blog/serverless-rl?refresh=1760042248153
Why RL Won — Kyle Corbitt, OpenPipe (acq. CoreWeave)
Oct 07
AI Ready

DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever

At OpenAI DevDay, we sit down with Sherwin Wu and Christina Cai from the OpenAI Platform Team to discuss the launch of AgentKit - a comprehensive suite of tools for building, deploying, and optimizing AI agents. Christina walks us through the live demo she performed on stage, building a customer support agent in just 8 minutes using the visual Agent Builder, while Sherwin shares insights on how OpenAI is inverting the traditional website-chatbot paradigm by embedding apps directly within ChatGPT through the new Apps SDK. The conversation explores how OpenAI is tackling the challenges developers face when taking agents to production - from writing and optimizing prompts to building evaluation pipelines. They discuss the decision to adopt Anthropic's MCP protocol for tool connectivity, the importance of visual workflows for complex agent systems, and how features like human-in-the-loop approvals and automated prompt optimization are making agent development more accessible to a broader range of developers. Sherwin and Christina also reveal how OpenAI is dogfooding these tools internally, with their own customer support at openai.com already powered by AgentKit, and share candid insights about the evolution from plugins to GPTs to this new agent platform. They discuss the surprising persistence of prompting as a critical skill (contrary to predictions from two years ago), the challenges of serving custom fine-tuned models at scale, and why they believe visual agent builders are essential as workflows grow to span dozens of nodes. Guests: Sherwin Wu: Head of Engineering, OpenAI Platform https://www.linkedin.com/in/sherwinwu1/ https://x.com/sherwinwu?lang=en Christina Huang: Platform Experience, OpenAI https://x.com/christinaahuang https://www.linkedin.com/in/christinaahuang/ Thanks very much to Lindsay and Shaokyi for helping us set up this great deepdive into the new DevDay launches! Key Topics: • AgentKit launch: Agent SDK, Builder, Evals, and deployment tools • Apps SDK and the inversion of the app-chatbot paradigm • Adopting MCP protocol for universal tool connectivity • Visual agent building vs code-first approaches • Human-in-the-loop workflows and approval systems • Automated prompt optimization and "zero-gradient fine-tuning" • Service Health Dashboard and achieving five nines reliability • ChatKit as an embeddable, evergreen chat interface • The evolution from plugins to GPTs to agent platforms • Internal dogfooding with Codex and agent-powered support
DevDay 2025: Apps SDK, Agent Kit, MCP, Codex and why Prompting is More Important than Ever
Sep 11
AI Ready

Context Engineering for Agents - Lance Martin, LangChain

Lance: https://www.linkedin.com/in/lance-martin-64a33b5/ How Context Fails: https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.html How New Buzzwords Get Created: https://www.dbreunig.com/2025/07/24/why-the-term-context-engineering-matters.html Content Engineering: https://x.com/RLanceMartin/status/1948441848978309358 https://rlancemartin.github.io/2025/06/23/context_engineering/ https://docs.google.com/presentation/d/16aaXLu40GugY-kOpqDU4e-S0hD1FmHcNyF0rRRnb1OU/edit?usp=sharing Manus Post: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus Cognition Post: https://cognition.ai/blog/dont-build-multi-agents Multi-Agent Researcher: https://www.anthropic.com/engineering/multi-agent-research-system Human-in-the-loop + Memory: https://github.com/langchain-ai/agents-from-scratch - Bitter Lesson in AI Engineering - Hyung Won Chung on the Bitter Lesson in AI Research: https://www.youtube.com/watch?v=orDKvo8h71o Bitter Lesson w/ Claude Code: https://www.youtube.com/watch?v=Lue8K2jqfKk&t=1s Learning the Bitter Lesson in AI Engineering: https://rlancemartin.github.io/2025/07/30/bitter_lesson/ Open Deep Research: https://github.com/langchain-ai/open_deep_research https://academy.langchain.com/courses/deep-research-with-langgraph Scaling and building things that "don't yet work": https://www.youtube.com/watch?v=p8Jx4qvDoSo - Frameworks - Roast framework at Shopify / standardization of orchestration tools: https://www.youtube.com/watch?v=0NHCyq8bBcM MCP adoption within Anthropic / standardization of protocols: https://www.youtube.com/watch?v=xlEQ6Y3WNNI How to think about frameworks: https://blog.langchain.com/how-to-think-about-agent-frameworks/ RAG benchmarking: https://rlancemartin.github.io/2025/04/03/vibe-code/ Simon's talk with memory-gone-wrong: https://simonwillison.net/2025/Jun/6/six-months-in-llms/
Context Engineering for Agents - Lance Martin, LangChain
Sep 05

A Technical History of Generative Media

Today we are joined by Gorkem and Batuhan from Fal.ai, the fastest growing generative media inference provider. They recently raised a $125M Series C and crossed $100M ARR. We covered how they pivoted from dbt pipelines to diffusion models inference, what were the models that really changed the trajectory of image generation, and the future of AI videos. Enjoy! 00:00 - Introductions 04:58 - History of Major AI Models and Their Impact on Fal.ai 07:06 - Pivoting to Generative Media and Strategic Business Decisions 10:46 - Technical discussion on CUDA optimization and kernel development 12:42 - Inference Engine Architecture and Kernel Reusability 14:59 - Performance Gains and Latency Trade-offs 15:50 - Discussion of model latency importance and performance optimization 17:56 - Importance of Latency and User Engagement 18:46 - Impact of Open Source Model Releases and Competitive Advantage 19:00 - Partnerships with closed source model developers 20:06 - Collaborations with Closed-Source Model Providers 21:28 - Serving Audio Models and Infrastructure Scalability 22:29 - Serverless GPU infrastructure and technical stack 23:52 - GPU Prioritization: H100s and Blackwell Optimization 25:00 - Discussion on ASICs vs. General Purpose GPUs 26:10 - Architectural Trends: MMDiTs and Model Innovation 27:35 - Rise and Decline of Distillation and Consistency Models 28:15 - Draft Mode and Streaming in Image Generation Workflows 29:46 - Generative Video Models and the Role of Latency 30:14 - Auto-Regressive Image Models and Industry Reactions 31:35 - Discussion of OpenAI's Sora and competition in video generation 34:44 - World Models and Creative Applications in Games and Movies 35:27 - Video Models’ Revenue Share and Open-Source Contributions 36:40 - Rise of Chinese Labs and Partnerships 38:03 - Top Trending Models on Hugging Face and ByteDance's Role 39:29 - Monetization Strategies for Open Models 40:48 - Usage Distribution and Model Turnover on FAL 42:11 - Revenue Share vs. Open Model Usage Optimization 42:47 - Moderation and NSFW Content on the Platform 44:03 - Advertising as a key use case for generative media 45:37 - Generative Video in Startup Marketing and Virality 46:56 - LoRA Usage and Fine-Tuning Popularity 47:17 - LoRA ecosystem and fine-tuning discussion 49:25 - Post-Training of Video Models and Future of Fine-Tuning 50:21 - ComfyUI Pipelines and Workflow Complexity 52:31 - Requests for startups and future opportunities in the space 53:33 - Data Collection and RedPajama-Style Initiatives for Media Models 53:46 - RL for Image and Video Models: Unknown Potential 55:11 - Requests for Models: Editing and Conversational Video Models 57:12 - VO3 Capabilities: Lip Sync, TTS, and Timing 58:23 - Bitter Lesson and the Future of Model Workflows 58:44 - FAL's hiring approach and team structure 59:29 - Team Structure and Scaling Applied ML and Performance Teams 1:01:41 - Developer Experience Tools and Low-Code/No-Code Integration 1:03:04 - Improving Hiring Process with Public Challenges and Benchmarks 1:04:02 - Closing Remarks and Culture at FAL
A Technical History of Generative Media
Aug 29

Better Data is All You Need — Ari Morcos, Datology

Our chat with Ari shows that data curation is the most impactful and underinvested area in AI. He argues that the prevailing focus on model architecture and compute scaling overlooks the "bitter lesson" that "models are what they eat." Effective data curation—a sophisticated process involving filtering, rebalancing, sequencing (curriculum), and synthetic data generation—allows for training models that are simultaneously faster, better, and smaller. Morcos recounts his personal journey from focusing on model-centric inductive biases to realizing that data quality is the primary lever for breaking the diminishing returns of naive scaling laws. Datology's mission is to automate this complex curation process, making state-of-the-art data accessible to any organization and enabling a new paradigm of AI development where data efficiency, not just raw scale, drives progress. Timestamps 00:00 Introduction 00:46 What is Datology? The mission to train models faster, better, and smaller through data curation. 01:59 Ari's background: From neuroscience to realizing the "Bitter Lesson" of AI. 05:30 Key Insight: Inductive biases from architecture become less important and even harmful as data scale increases. 08:08 Thesis: Data is the most underinvested area of AI research relative to its impact. 10:15 Why data work is culturally undervalued in research and industry. 12:19 How self-supervised learning changed everything, moving from a data-scarce to a data-abundant regime. 17:05 Why automated curation is superior to human-in-the-loop, citing the DCLM study. 19:22 The "Elephants vs. Dogs" analogy for managing data redundancy and complexity. 22:46 A brief history and commentary on key datasets (Common Crawl, GitHub, Books3). 26:24 Breaking naive scaling laws by improving data quality to maintain high marginal information gain. 29:07 Datology's demonstrated impact: Achieving baseline performance 12x faster. 34:19 The business of data: Datology's moat and its relationship with open-source datasets. 39:12 Synthetic Data Explain ed: The difference between risky "net-new" creation and powerful "rephrasing." 49:02 The Resurgence of Curriculum Learning: Why ordering data matters in the underfitting regime. 52:55 The Future of Training: Optimizing pre-training data to make post-training more effective. 54:49 Who is training their own models and why (Sovereign AI, large enterprises). 57:24 "Train Smaller": Why inference cost makes smaller, specialized models the ultimate goal for enterprises. 01:00:19 The problem with model pruning and why data-side solutions are complementary. 01:03:03 On finding the smallest possible model for a given capability. 01:06:49 Key learnings from the RC foundation model collaboration, proving that data curation "stacks." 01:09:46 Lightning Round: What data everyone wants & who should work at Datology. 01:14:24 Commentary on Meta's superintelligence efforts and Yann LeCun's role.
Better Data is All You Need — Ari Morcos, Datology
Aug 15
AI Ready

Greg Brockman on OpenAI's Road to AGI

Greg Brockman, co-founder and president of OpenAI, joins us to talk about GPT-5 and GPT-OSS, the future of software engineering, why reinforcement learning is still scaling, and how OpenAI is planning to get to AGI. 00:00 Introductions 01:04 The Evolution of Reasoning at OpenAI 04:01 Online vs Offline Learning in Language Models 06:44 Sample Efficiency and Human Curation in Reinforcement Learning 08:16 Scaling Compute and Supercritical Learning 13:21 Wall clock time limitations in RL and real-world interactions 16:34 Experience with ARC Institute and DNA neural networks 19:33 Defining the GPT-5 Era 22:46 Evaluating Model Intelligence and Task Difficulty 25:06 Practical Advice for Developers Using GPT-5 31:48 Model Specs 37:21 Challenges in RL Preferences (e.g., try/catch) 39:13 Model Routing and Hybrid Architectures in GPT-5 43:58 GPT-5 pricing and compute efficiency improvements 46:04 Self-Improving Coding Agents and Tool Usage 49:11 On-Device Models and Local vs Remote Agent Systems 51:34 Engineering at OpenAI and Leveraging LLMs 54:16 Structuring Codebases and Teams for AI Optimization 55:27 The Value of Engineers in the Age of AGI 58:42 Current state of AI research and lab diversity 01:01:11 OpenAI’s Prioritization and Focus Areas 01:03:05 Advice for Founders: It's Not Too Late 01:04:20 Future outlook and closing thoughts 01:04:33 Time Capsule to 2045: Future of Compute and Abundance 01:07:07 Time Capsule to 2005: More Problems Will Emerge
Greg Brockman on OpenAI's Road to AGI