AI Inference: Why Speed Matters More Than You Think (with SambaNova's Kwasi Ankomah)

The Neuron: AI Explained - Een podcast door The Neuron

Podcast artwork

Categorieën:

Everyone's talking about the AI datacenter boom right now. Billion dollar deals here, hundred billion dollar deals there. Well, why do data centers matter? It turns out, AI inference (actually calling the AI and running it) is the hidden bottleneck slowing down every AI application you use (and new stuff yet to be released). In this episode, Kwasi Ankomah from SambaNova Systems explains why running AI models efficiently matters more than you think, how their revolutionary chip architecture delivers 700+ tokens per second, and why AI agents are about to make this problem 10x worse.💡 This episode is sponsored by Gladia's Solaria - the speech-to-text API built for real-world voice AI. With sub-270ms latency, 100+ languages supported, and 94% accuracy even in noisy environments, it's the backbone powering voice agents that actually work. Learn more at gladia.io/solaria🔗 Key Links:• SambaNova Cloud: https://cloud.sambanova.ai• Check out Solaria speech to text API: https://www.gladia.io/solaria• Subscribe to The Neuron newsletter: https://theneuron.ai🎯 What You'll Learn:• Why inference speed matters more than model size• How SambaNova runs massive models on 90% less power• Why AI agents use 10-20x more tokens• The best open source models right now• What to watch for in AI infrastructure➤ CHAPTERSTimecode - Chapter Title0:00 - Intro2:14 - What is AI Inference?3:19 - Why Inference is the Real Challenge9:18 - A message from our sponsor, Gladia Solaria10:16 - The 95% ROI Problem Discussion13:47 - SambaNova's Revolutionary Chip Architecture15:19 - Running DeepSeek's 670B Parameter Models18:11 - Developer Experience & Platform21:26 - AI Agents and the Token Explosion24:33 - Model Swapping and Cost Optimization31:30 - Energy Efficiency 10kW vs 100kW36:13 - Future of AI Models Bigger vs Smaller39:24 - Best Open Source Models Right Now46:01 - AI Infrastructure Next 12 Months47:09 - Agents as Infrastructure50:28 - Human-in-the-Loop and Trust52:55 - Closing and ResourcesArticle Written by: Grant HarveyHosted by: Corey Noles and Grant HarveyGuest: Kwasi AnkomahPublished by: Manique SantosEdited by: Adrian Vallinan

Visit the podcast's native language site