Speculative Decoding: When Two LLMs are Faster than One
8:15
How is Beam Search Really Implemented?
11:54
How FlashAttention Accelerates Generative AI Revolution
27:14
Transformers (how LLMs work) explained visually | DL5
30:13
LCM: The Ultimate Evolution of AI? Large Concept Models
41:57
Scalable, Robust, and Hardware-aware Speculative Decoding
1:04:28
vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024
19:46
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
57:45