Post #577

@MachineLearningResearch

AML

Views73Post view count

PostedDec 2612/26/2025, 01:35 PM

Post content

Introduced software agents can self-improve via self-play RL Self-play SWE-RL (SSR): training a single LLM agent to self-play between bug-injection and bug-repair, grounded in real-world repositories, no human-labeled issues or tests Bug-injection: the agent creates a standard suite of bug artifacts, further validated for consistency Key steps: 1) original tests must pass, 2) tests fail after applying the bug-injection patch, 3) weakened tests should pass