Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
19:39
RLHF & DPO Explained (In Simple Terms!)
48:46
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
11:22
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution – Paper Explained
8:57
RAG vs. Fine Tuning
6:37
REPA Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You ...
58:07
Aligning LLMs with Direct Preference Optimization
8:48
Large Language Models explained briefly
19:48