[AI]■ STORY TIMELINE

BERKELEY RESEARCHERS BREAK TOP AI AGENT BENCHMARKS

Berkeley's RDI team demonstrated critical flaws in leading AI agent benchmarks, achieving near-perfect scores by exploiting structural weaknesses rather than improving actual AI capabilities.

1 SOURCEFIRST SEEN APR 11, 07:15 PM► READ THE ARTICLE

Hacker News+0m

How We Broke Top AI Agent Benchmarks: And What Comes Next

Article URL: https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/ Comments URL: https://news.ycombinator.com/item?…

◄ BACK TO ARTICLE