How to Train LLMs to "Think" (o1 & DeepSeek-R1)