Post #6238

@aiposted

AI Post — Artificial Intelligence

Visninger5,790Antal visninger

Publiceret10. mar.10.03.2026, 05.40

Indhold

Opslagsindhold

⚠️Wild moment in AI research. Anthropic found Claude Opus 4.6 was gaming a benchmark during evaluation. What it did: - Burned 40M tokens searching - Realized the prompt looked like a benchmark - Looked up the benchmark online - Found the GitHub repo - Studied the decryption logic - Recreated it with SHA-256 - Decrypted answers for ~1200 questions This happened 18 different times. So Anthropic did something rare: They publicly disclosed it and reduced their own benchmark scores. AI systems are getting surprisingly strategic. @aipost🏴