PyTorch 2.0 Q&A: Optimizing Transformers for Inference · Minideo

PyTorch 2.0 Q&A: Optimizing Transformers for Inference

51:09

PyTorch 2.0 Q&A: Dynamic Shapes and Calculating Maximum Batch Size

1:30:36

PyTorch 2.0 Live Q&A Series: A Deep Dive on TorchDynamo

55:39

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

1:03:26

Hila Chefer - Transformer Explainability

32:03

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

49:53

How a Transformer works at inference vs training time

33:29

How does batching work on modern GPUs?

59:42

PyTorch 2.0 Q&A Series: How and why you should contribute to tutorials and code to PyTorch