#006 - The Subtle Art of Inference with Adam Grzywaczewski

Artikel konnten nicht hinzugefügt werden

Leider können wir den Artikel nicht hinzufügen, da Ihr Warenkorb bereits seine Kapazität erreicht hat.

Der Titel konnte nicht zum Warenkorb hinzugefügt werden.

Bitte versuchen Sie es später noch einmal

Der Titel konnte nicht zum Merkzettel hinzugefügt werden.

Bitte versuchen Sie es später noch einmal

„Von Wunschzettel entfernen“ fehlgeschlagen.

Bitte versuchen Sie es später noch einmal

„Podcast folgen“ fehlgeschlagen

„Podcast nicht mehr folgen“ fehlgeschlagen

#006 - The Subtle Art of Inference with Adam Grzywaczewski

Jetzt kostenlos hören, ohne Abo

Details anzeigen

Über diesen Titel

In this episode of The Private AI Lab, Johan van Amersfoort speaks with Adam Grzywaczewski, a senior Deep Learning Data Scientist at NVIDIA, about the rapidly evolving world of AI inference.

They explore how inference has shifted from simple, single-GPU execution to highly distributed, latency-sensitive systems powering today’s large language models. Adam explains the real bottlenecks teams face, why software optimization and hardware innovation must move together, and how NVIDIA’s inference stack—from TensorRT-LLM to Dynamo—enables scalable, cost-efficient deployments.

The conversation also covers quantization, pruning, mixture-of-experts models, AI factories, and why inference optimization is becoming one of the most critical skills in modern AI engineering.

Topics covered

Why inference is now harder than training
Autoregressive models and KV-cache challenges
Mixture-of-experts architectures
NVIDIA Dynamo and TensorRT-LLM
Hardware vs software optimization
Quantization, pruning, and distillation
Latency vs throughput trade-offs
The rise of AI factories and DGX systems
What’s next for AI inference

Noch keine Rezensionen vorhanden