Deep dive: model merging (part 1)
32:15
Deep dive: model merging, part 2
57:45
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
36:12
Deep Dive: Optimizing LLM inference
1:19:27
Stanford CS25: V3 I Retrieval Augmented Language Models
44:06
LLM inference optimization: Architecture, KV cache and Flash attention
1:56:53
When and Why to Fine Tune an LLM
40:54
Deep dive - Better Attention layers for Transformer models
1:44:31