PyTorch 2.0 Q&A: Optimizing Transformers for Inference
51:09
PyTorch 2.0 Q&A: Dynamic Shapes and Calculating Maximum Batch Size
1:30:36
PyTorch 2.0 Live Q&A Series: A Deep Dive on TorchDynamo
49:53
How a Transformer works at inference vs training time
33:29
How does batching work on modern GPUs?
1:29:25
Namespaces for All - Advanced Session
32:03
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
36:12
Deep Dive: Optimizing LLM inference
59:42