TGTGInsighttelegram intelligenceLIVE / telegram public index
Post content
Post content
За рамками alignment'а: как обучение с подкреплением создаёт новое поколение Ml с ризонингом A Survey of Reinforcement Learning for Large Reasoning Models https://arxiv.org/abs/2509.08827 https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs https://arxiviq.substack.com/p/a-survey-of-reinforcement-learning