PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference
29:18
PyTorch Expert Exchange Hacker Cup AI
32:03
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference
24:21
Running State-of-Art Gen AI Models on-Device with NPU Acceleration - Felix Baum, Qualcomm
35:50
Efficient Streaming Language Models with Attention Sinks
33:29
How does batching work on modern GPUs?
28:05
Building Scientific Computing Infrastructure Software with the PyTorch Ecosystem - Bharath Ramsundar
9:26
Together Goes Brrr: Threading Research & Production with Torch Compile - Pragaash Ponnusamy
8:50