[AI]■ STORY TIMELINE
SWE-BENCH VERIFIED LOSES RELEVANCE FOR AI CODING
OpenAI has stopped using SWE-bench Verified as a benchmark for evaluating frontier coding capabilities, signaling that the widely-used test no longer reflects the performance levels of advanced AI systems.
Hacker News+0m
Article URL: https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/ Comments URL: https://news.ycombinat…