Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained) · Minideo

Grokking: Generalization beyond Overfitting on small algorithmic datasets (Paper Explained)

48:30

Gradients are Not All You Need (Machine Learning Research Paper Explained)

29:51

New Discovery: LLMs have a Performance Phase

44:20

PonderNet: Learning to Ponder (Machine Learning Research Paper Explained)

27:02

Finally: Grokking Solved - It's Not What You Think

37:17

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

20:18

Why Does Diffusion Work Better than Auto-Regression?

54:39

Rethinking Attention with Performers (Paper Explained)

27:48

Were RNNs All We Needed? (Paper Explained)