Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
2:15:13
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
1:19:37
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
1:10:55
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
58:07
Aligning LLMs with Direct Preference Optimization
50:36
NLP Lecture 1: Regular Expressions
21:15
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
54:52
BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token
1:09:00