PyTorch 2.0 Q&A: Optimizing Transformers for Inference
51:09
PyTorch 2.0 Q&A: Dynamic Shapes and Calculating Maximum Batch Size
1:30:36
PyTorch 2.0 Live Q&A Series: A Deep Dive on TorchDynamo
55:39
Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works
1:03:26
Hila Chefer - Transformer Explainability
32:03
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
49:53
How a Transformer works at inference vs training time
33:29
How does batching work on modern GPUs?
59:42