RLHF & DPO Explained (In Simple Terms!)
21:15
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
14:39
LoRA & QLoRA Fine-tuning Explained In-Depth
11:29
Reinforcement Learning from Human Feedback (RLHF) Explained
26:03
Reinforcement Learning: Machine Learning Meets Control Theory
48:46
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
8:55
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
13:23
An update on DPO vs PPO for LLM alignment
58:07