Understanding the LLM Inference Workload - Mark Moyou, NVIDIA
![](https://i.ytimg.com/vi/9ih0EmcXRHE/mqdefault.jpg)
23:33
vLLM: Easy, Fast, and Cheap LLM Serving for Everyone - Woosuk Kwon & Xiaoxuan Liu, UC Berkeley
![](https://i.ytimg.com/vi/mYRqvB1_gRk/mqdefault.jpg)
30:25
Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral
![](https://i.ytimg.com/vi/9tvJ_GYJA-o/mqdefault.jpg)
33:39
Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
![](https://i.ytimg.com/vi/XVO3zsHdvio/mqdefault.jpg)
39:58
Crie tudo com agentes de IA: veja como
![](https://i.ytimg.com/vi/jk2FsJxZFo8/mqdefault.jpg)
44:06
LLM inference optimization: Architecture, KV cache and Flash attention
![](https://i.ytimg.com/vi/5ZlavKF_98U/mqdefault.jpg)
32:07
Fast LLM Serving with vLLM and PagedAttention
![](https://i.ytimg.com/vi/wjZofJX0v4M/mqdefault.jpg)
27:14
Transformers (how LLMs work) explained visually | DL5
![](https://i.ytimg.com/vi/kJLiOGle3Lw/mqdefault.jpg)
57:06