[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
![](https://i.ytimg.com/vi/QdEuh2UVbu0/mqdefault.jpg)
25:36
DeepSeek R1 Theory Overview | GRPO + RL + SFT
![](https://i.ytimg.com/vi/AfAmwIP2ntY/mqdefault.jpg)
53:02
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
![](https://i.ytimg.com/vi/XMnxKGVnEUc/mqdefault.jpg)
1:19:37
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
![](https://i.ytimg.com/vi/1vJsWKg2o1w/mqdefault.jpg)
25:05
Ranking Paradoxes, From Least to Most Paradoxical
![](https://i.ytimg.com/vi/YF7Xk48VfzQ/mqdefault.jpg)
56:09
DeepSeek DeepDive (R1, V3, Math, GRPO)
![](https://i.ytimg.com/vi/r_UBBfTPcF0/mqdefault.jpg)
37:17
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
![](https://i.ytimg.com/vi/cQNyYx2fZXw/mqdefault.jpg)
27:22
AI Is Making You An Illiterate Programmer
![](https://i.ytimg.com/vi/7ARBJQn6QkM/mqdefault.jpg)
1:03:03