Reinforcement learning from human feedback (NLP817 12.3)
![](https://i.ytimg.com/vi/Lvhu0_y6K4E/mqdefault.jpg)
13:10
The difference between GPT and ChatGPT (NLP817 12.2)
![](https://i.ytimg.com/vi/vXtfdGphr3c/mqdefault.jpg)
8:25
Reinforcement Learning from scratch
![](https://i.ytimg.com/vi/56PlUikhB3o/mqdefault.jpg)
59:17
RLHF: How to Learn from Human Feedback with Reinforcement Learning
![](https://i.ytimg.com/vi/dCelRItdGfc/mqdefault.jpg)
10:17
Reinforcement Learning through Human Feedback - EXPLAINED! | RLHF
![](https://i.ytimg.com/vi/Kv90HQY9lZA/mqdefault.jpg)
23:26
Rotary Position Embedding explained deeply (w/ code)
![](https://i.ytimg.com/vi/pJfvPMNPZAU/mqdefault.jpg)
19:32
Reinforcement Learning - My Algorithm vs State of the Art
![](https://i.ytimg.com/vi/e20EY4tFC_Q/mqdefault.jpg)
29:05
Policy Gradient Methods | Reinforcement Learning Part 6
![](https://i.ytimg.com/vi/5dnVH7jCZKQ/mqdefault.jpg)
10:36