How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony
How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony
How Zyphra went all-in on AMD + Why Devs feel faster with AI but are slower — with Quentin Anthony
In this episode, we dive into the technical and strategic decisions behind building high-performance AI models on alternative hardware, as Quentin Anthony shares insights from his work at Zyphra and EleutherAI. From rethinking GPU ecosystems to pioneering edge-optimized architectures, the conversation reveals how low-level engineering choices are shaping the future of accessible, efficient AI deployment.
Quentin Anthony discusses Zyphra's shift to AMD MI300X GPUs, where superior memory bandwidth enables faster training than NVIDIA H100s for certain workloads. He emphasizes writing custom kernels in ROCm and even assembly when necessary, bypassing high-level frameworks for maximum control. Their hybrid Mamba-transformer models, like Zamba 2, achieve competitive performance at smaller parameter counts, ideal for edge devices from phones to desktops. Quentin reflects on being a rare success case in the Menlo AI productivity study, advocating disciplined AI use—avoiding context rot by restarting chats and delegating only specific subtasks. He critiques overreliance on AI for core reasoning and highlights the value of velocity-focused engineering teams. Open-source research at EleutherAI prioritizes interpretability, while he argues companies should invest in developer ecosystems over traditional marketing to drive real innovation.
02:27
02:27
MI300X can beat H100 in non-FP8 dense transformers due to high VRAM and memory bandwidth
05:08
05:08
Instead of using Triton, we write kernels directly in ROCm and expose them via Torch.
12:29
12:29
Kernel datasets could improve model training but are not a silver bullet due to validation challenges.
19:38
19:38
Training inference-efficient models while considering future hardware compatibility.
26:51
26:51
High-quality models are preferred over faster local inference when there's a performance trade-off.
29:11
29:11
AI speeds up work only in specific cases with digital hygiene practices
45:23
45:23
Blindly trusting AI in development can lead to high-cost mistakes.
47:25
47:25
Physicists are the 'embryonic stem cells' of engineers due to their problem-solving adaptability
