Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

34:11
GDC 2024 - GPU Work Graphs: Welcome to the Future of GPU Programming

22:30
vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley

25:55
Efficient Inference on MI300X: Our Journey at Microsoft, Rajat Monga, Microsoft, CVP AI Frameworks

47:51
Scaling LLM Batch Inference: Ray Data & vLLM for High Throughput

1:01:21
Scalable and Efficient Systems for Large Language Models—Lianmin Zheng (Berkeley)

2:26:31
Efficient LLM Deployment and Serving Meetup - Oct 16, 2024

32:03
DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

30:52