Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

2:59:24
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

57:45
Visualizing transformers and attention | Talk for TNG Big Tech Day '24

2:11:12
How I use LLMs

27:12
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

44:06
LLM inference optimization: Architecture, KV cache and Flash attention

36:16
The math behind Attention: Keys, Queries, and Values matrices

7:38:18
Flash Attention derived and coded from first principles with Triton (Python)

54:52