PyTorch 2.0 Q&A: Optimizing Transformers for Inference · Minideo

PyTorch 2.0 Q&A: Optimizing Transformers for Inference

51:09

PyTorch 2.0 Q&A: Dynamic Shapes and Calculating Maximum Batch Size

1:30:36

PyTorch 2.0 Live Q&A Series: A Deep Dive on TorchDynamo

49:53

How a Transformer works at inference vs training time

33:29

How does batching work on modern GPUs?

1:29:25

Namespaces for All - Advanced Session

32:03

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

36:12

Deep Dive: Optimizing LLM inference

59:42

PyTorch 2.0 Q&A Series: How and why you should contribute to tutorials and code to PyTorch