Reinforcement Learning for Reasoning in LLMs with One Training Example arxiv.org 2 points by chrsw 11 hours ago