PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference · Minideo

PyTorch Expert Exchange: Efficient Generative Models: From Sparse to Distributed Inference

32:03

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

29:18

PyTorch Expert Exchange Hacker Cup AI

30:54

Diffusion models from scratch in PyTorch

24:21

Running State-of-Art Gen AI Models on-Device with NPU Acceleration - Felix Baum, Qualcomm

44:06

LLM inference optimization: Architecture, KV cache and Flash attention

1:03:03

NVIDIA CEO Jensen Huang's Vision for the Future

35:50

Efficient Streaming Language Models with Attention Sinks

33:29

How does batching work on modern GPUs?