PyTorch 2.0 Q&A: Optimizing Transformers for Inference · Minideo

PyTorch 2.0 Q&A: Optimizing Transformers for Inference

51:09

PyTorch 2.0 Q&A: Dynamic Shapes and Calculating Maximum Batch Size

33:29

How does batching work on modern GPUs?

1:30:36

PyTorch 2.0 Live Q&A Series: A Deep Dive on TorchDynamo

59:42

PyTorch 2.0 Q&A Series: How and why you should contribute to tutorials and code to PyTorch

1:49:16

Q/A and review logic of experimental verifiability

32:03

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

55:39

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

57:45

Visualizing transformers and attention | Talk for TNG Big Tech Day '24