Byte Latent Transformer: Patches Scale Better Than Tokens

36:15
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

40:08
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

37:31
Byte Latent Transformer - BLT explained (Entropy of Next Byte, META)

33:37
Something Strange Happens When You Trust Quantum Mechanics

18:02
AI Has a Fatal Flaw—And Nobody Can Fix It

25:22
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

24:01
ChatGPT is made from 100 million of these [The Perceptron]

1:21:16