Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

1:21:39
DeepSeek-V3

32:31
Round and Round We Go! What makes Rotary Positional Encodings useful?

3:31:24
Deep Dive into LLMs like ChatGPT

12:24
Majorana 1 Explained: The Path to a Million Qubits

1:16:31
CMU Advanced NLP Spring 2025 (11): Reinforcement Learning

57:45
Visualizing transformers and attention | Talk for TNG Big Tech Day '24

26:26
10 Years Later: Software Opinions I’ve Completely Changed

59:24