PyTorch 2.0 Q&A: Optimizing Transformers for Inference
51:09
PyTorch 2.0 Q&A: Dynamic Shapes and Calculating Maximum Batch Size
33:29
How does batching work on modern GPUs?
1:30:36
PyTorch 2.0 Live Q&A Series: A Deep Dive on TorchDynamo
59:42
PyTorch 2.0 Q&A Series: How and why you should contribute to tutorials and code to PyTorch
1:49:16
Q/A and review logic of experimental verifiability
32:03
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
55:39
Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works
57:45