Research Paper Deep Dive - The Sparsely-Gated Mixture-of-Experts (MoE)
16:31
LIMoE: Learning Multiple Modalities with One Sparse Mixture-of-Experts Model
50:03
Databricks LLM, DBRX: Model design and challenges. The lecture for the @BuzzRobot community
28:01
Understanding Mixture of Experts
1:26:21
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
52:46
Miika Aittala: Elucidating the Design Space of Diffusion-Based Generative Models
1:05:44
Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer
11:45
Research Paper Deep Dive - Vision GNN: An Image is Worth Graph of Nodes
1:09:58