DPO to TPO: Test-Time Preference Optimization (RL) · Minideo

DPO to TPO: Test-Time Preference Optimization (RL)

26:19

Goodbye RAG - Smarter CAG w/ KV Cache Optimization

38:41

FUSION: Knowledge GRAPHS are more than TOOLS for LLM

1:19:37

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

27:22

AI Is Making You An Illiterate Programmer

48:46

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

19:11

ULTIMATE Fact Checking AI (Johns Hopkins, Stanford)

28:44

SMARTER: AI Reasoning w Knowledge Graphs + Agents

34:05

Code CoT w/ Self-Evolution LLM: rStar-Math Explained