Efficient LLM Inference with SGLang, Lianmin Zheng, xAI · Minideo

Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

34:11

GDC 2024 - GPU Work Graphs: Welcome to the Future of GPU Programming

22:30

vLLM: Easy, Fast, and Cheap LLM Serving, Woosuk Kwon, UC Berkeley

25:55

Efficient Inference on MI300X: Our Journey at Microsoft, Rajat Monga, Microsoft, CVP AI Frameworks

47:51

Scaling LLM Batch Inference: Ray Data & vLLM for High Throughput

1:01:21

Scalable and Efficient Systems for Large Language Models—Lianmin Zheng (Berkeley)

2:26:31

Efficient LLM Deployment and Serving Meetup - Oct 16, 2024

32:03

DistServe: disaggregating prefill and decoding for goodput-optimized LLM inference

30:52

The Evolution of Multi-GPU Inference in vLLM | Ray Summit 2024