PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference · Minideo

PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference

29:18

PyTorch Expert Exchange Hacker Cup AI

32:03

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

24:21

Running State-of-Art Gen AI Models on-Device with NPU Acceleration - Felix Baum, Qualcomm

35:50

Efficient Streaming Language Models with Attention Sinks

33:29

How does batching work on modern GPUs?

28:05

Building Scientific Computing Infrastructure Software with the PyTorch Ecosystem - Bharath Ramsundar

9:26

Together Goes Brrr: Threading Research & Production with Torch Compile - Pragaash Ponnusamy

8:50

Lightning Talk: Low Precision Dtypes in PyTorch - Vasiliy Kuznetsov, Meta